Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG...

51
Graph Edit Distance Benoˆ ıt Ga¨ uz` ere June 30th, 2016

Transcript of Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG...

Page 1: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Graph Edit Distance

Benoıt Gauzere

June 30th 2016

Introduction

GED computation

Experiences

Discussion

Structural Information

Use of Structure- Activity Data To Compare Structure-Based Clustering Methods andDescriptors for Use in Compound Selection

Robert D Browndagger and Yvonne C MartinDagger

Pharmaceutical Products Division Abbott Laboratories D47EAP10 100 Abbott Park RoadAbbott Park Illinois 60064-3500

Received September 6 1995X

An evaluation of a variety of structure-based clustering methods for use in compound selection is presentedThe use of MACCS Unity and Daylight 2D descriptors Unity 3D rigid and flexible descriptors and twoin-house 3D descriptors based on potential pharmacophore points are considered The use of Wardrsquos andgroup-average hierarchical agglomerative Guenoche hierarchical divisive and Jarvis- Patrick nonhierarchicalclustering methods are compared The results suggest that 2D descriptors and hierarchical clustering methodsare best at separating biologically active molecules from inactives a prerequisite for a good compoundselection method In particular the combination of MACCS descriptors and Wardrsquos clustering was optimal

INTRODUCTION

The advent of high-throughput biological screening meth-ods have given pharmaceutical companies the ability toscreen many thousands of compounds in a short timeHowever there are many hundreds of thousands of com-pounds available both in-house and from commercial ven-dors Whilst it may be feasible to screen many or all of thecompounds available this is undesirable for reasons of costand time and may be unnecessary if it results in theproduction of some redundant information Therefore therehas been a great deal of interest in the use of compoundclustering techniques to aid in the selection of a representativesubset of all the compounds available1- 8 A similar problemfaces those interested in designing compounds for synthesisa good design will capture all the required information inthe minimum number of compounds9

Underpinning the compound selection methods is thesimilar property principle10 which states that structurallysimilar molecules will exhibit similar physicochemical andbiological properties Given a clustering method that cangroup structurally similar compounds together applicationof this principle implies that the selection or synthesis andtesting of representatives from each cluster produced froma set of compounds should be sufficient to understand thestructure- activity relationships of the whole set without theneed to test them allAn appropriate clustering method will ideally cluster all

similar compounds together whilst separating active andinactive compounds into different sets of clusters The firstfactor will ensure that every class of active compound isrepresented in the selected subset but that there is noredundancy The second factor will minimize the risk thatan inactive compound is selected as the representative of acluster containing one or more actives thereby missing aclass of active compoundsClustering is the process of dividing a set of entities into

subsets in which the members of each subset are similar toeach other but different from members of other subsets There

have been numerous cluster methods described generaldiscussions of many of these are given by Gordon11 byEverett12 and by Sneath and Sokal13 Several of thesemethods have be applied to clustering chemical structurescomprehensive reviews are given by Barnard and Downs14

and by Downs and Willett15 In outline the clusteringprocess for chemical structures is as follows(1) Select a set of attributes on which to base the

comparison of the structures These may be structuralfeatures andor physicochemical properties(2) Characterize every structure in the dataset in terms of

the attributes selected in step one(3) Calculate a coefficient of similarity dissimilarity or

distance between every pair of structures in the dataset basedon their attributes(4) Use a clustering method to group together similar

structures based on the coefficients calculated in step threeSome clustering methods may require the calculation ofsimilarity values between the new objects formed and theexisting objects(5) Analyze the resultant clusters or classification hierarchy

to determine which of the possible sets of clusters shouldbe chosenA number of methods are available both for the production

of descriptors in steps (1) and (2) and clusters in step (4)Whilst there are also a large number of coefficients that mightbe used in step (3) the choice of clustering method maydetermine which is best suitedIn this paper we present a study aimed at identifying the

most suitable descriptors and clustering methods for use incompound selection We have used a variety of methods tocluster sets of structures with known biological activities andevaluated the clusters produced according to their ability toseparate active and inactive compounds into different setsof clusters We have concerned ourselves with structurebased clustering For this the substructure search screensused in commercial database searching software have oftenbeen used as descriptors We have examined a number ofthese descriptors together with two developed in-house andhave considered the use of four commercially availableclustering methods

dagger brownrabbottcomDagger yvonnemartinabbottcomX Abstract published in AdVance ACS Abstracts January 15 1996

572 J Chem Inf Comput Sci 1996 36 572- 584

0095-2338961636-0572$12000 copy 1996 American Chemical Society

Cl

NN

CH3O

O

NH

NN

H3C O

O

HN

Cl

1 42

Encoding structural information

Vectors

I Embedding in an Euclidean space

All machine learning methods available

Loss of structural information

NN

C

O

C

2 42

Graph definition

A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V

If (vi vj) isin E then Vi is adjacent to Vj

3 42

Labelling function

Node labelling

I lv V rarr LVI LV Node label alphabet (symbolic real valued )

Edge labelling

I le V times V rarr LEI LE Edge label alphabet

4 42

Some graphs

Chemical science

NN

C

O

C

Symbolic labels atoms

Pattern Recognition

Vector labels Shape characteristics

5 42

Why use graphs

Graphs can handle structural informationBUTG 6= RN

We need to define a dissimilarity measure

6 42

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 2: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Introduction

GED computation

Experiences

Discussion

Structural Information

Use of Structure- Activity Data To Compare Structure-Based Clustering Methods andDescriptors for Use in Compound Selection

Robert D Browndagger and Yvonne C MartinDagger

Pharmaceutical Products Division Abbott Laboratories D47EAP10 100 Abbott Park RoadAbbott Park Illinois 60064-3500

Received September 6 1995X

An evaluation of a variety of structure-based clustering methods for use in compound selection is presentedThe use of MACCS Unity and Daylight 2D descriptors Unity 3D rigid and flexible descriptors and twoin-house 3D descriptors based on potential pharmacophore points are considered The use of Wardrsquos andgroup-average hierarchical agglomerative Guenoche hierarchical divisive and Jarvis- Patrick nonhierarchicalclustering methods are compared The results suggest that 2D descriptors and hierarchical clustering methodsare best at separating biologically active molecules from inactives a prerequisite for a good compoundselection method In particular the combination of MACCS descriptors and Wardrsquos clustering was optimal

INTRODUCTION

The advent of high-throughput biological screening meth-ods have given pharmaceutical companies the ability toscreen many thousands of compounds in a short timeHowever there are many hundreds of thousands of com-pounds available both in-house and from commercial ven-dors Whilst it may be feasible to screen many or all of thecompounds available this is undesirable for reasons of costand time and may be unnecessary if it results in theproduction of some redundant information Therefore therehas been a great deal of interest in the use of compoundclustering techniques to aid in the selection of a representativesubset of all the compounds available1- 8 A similar problemfaces those interested in designing compounds for synthesisa good design will capture all the required information inthe minimum number of compounds9

Underpinning the compound selection methods is thesimilar property principle10 which states that structurallysimilar molecules will exhibit similar physicochemical andbiological properties Given a clustering method that cangroup structurally similar compounds together applicationof this principle implies that the selection or synthesis andtesting of representatives from each cluster produced froma set of compounds should be sufficient to understand thestructure- activity relationships of the whole set without theneed to test them allAn appropriate clustering method will ideally cluster all

similar compounds together whilst separating active andinactive compounds into different sets of clusters The firstfactor will ensure that every class of active compound isrepresented in the selected subset but that there is noredundancy The second factor will minimize the risk thatan inactive compound is selected as the representative of acluster containing one or more actives thereby missing aclass of active compoundsClustering is the process of dividing a set of entities into

subsets in which the members of each subset are similar toeach other but different from members of other subsets There

have been numerous cluster methods described generaldiscussions of many of these are given by Gordon11 byEverett12 and by Sneath and Sokal13 Several of thesemethods have be applied to clustering chemical structurescomprehensive reviews are given by Barnard and Downs14

and by Downs and Willett15 In outline the clusteringprocess for chemical structures is as follows(1) Select a set of attributes on which to base the

comparison of the structures These may be structuralfeatures andor physicochemical properties(2) Characterize every structure in the dataset in terms of

the attributes selected in step one(3) Calculate a coefficient of similarity dissimilarity or

distance between every pair of structures in the dataset basedon their attributes(4) Use a clustering method to group together similar

structures based on the coefficients calculated in step threeSome clustering methods may require the calculation ofsimilarity values between the new objects formed and theexisting objects(5) Analyze the resultant clusters or classification hierarchy

to determine which of the possible sets of clusters shouldbe chosenA number of methods are available both for the production

of descriptors in steps (1) and (2) and clusters in step (4)Whilst there are also a large number of coefficients that mightbe used in step (3) the choice of clustering method maydetermine which is best suitedIn this paper we present a study aimed at identifying the

most suitable descriptors and clustering methods for use incompound selection We have used a variety of methods tocluster sets of structures with known biological activities andevaluated the clusters produced according to their ability toseparate active and inactive compounds into different setsof clusters We have concerned ourselves with structurebased clustering For this the substructure search screensused in commercial database searching software have oftenbeen used as descriptors We have examined a number ofthese descriptors together with two developed in-house andhave considered the use of four commercially availableclustering methods

dagger brownrabbottcomDagger yvonnemartinabbottcomX Abstract published in AdVance ACS Abstracts January 15 1996

572 J Chem Inf Comput Sci 1996 36 572- 584

0095-2338961636-0572$12000 copy 1996 American Chemical Society

Cl

NN

CH3O

O

NH

NN

H3C O

O

HN

Cl

1 42

Encoding structural information

Vectors

I Embedding in an Euclidean space

All machine learning methods available

Loss of structural information

NN

C

O

C

2 42

Graph definition

A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V

If (vi vj) isin E then Vi is adjacent to Vj

3 42

Labelling function

Node labelling

I lv V rarr LVI LV Node label alphabet (symbolic real valued )

Edge labelling

I le V times V rarr LEI LE Edge label alphabet

4 42

Some graphs

Chemical science

NN

C

O

C

Symbolic labels atoms

Pattern Recognition

Vector labels Shape characteristics

5 42

Why use graphs

Graphs can handle structural informationBUTG 6= RN

We need to define a dissimilarity measure

6 42

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 3: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Structural Information

Use of Structure- Activity Data To Compare Structure-Based Clustering Methods andDescriptors for Use in Compound Selection

Robert D Browndagger and Yvonne C MartinDagger

Pharmaceutical Products Division Abbott Laboratories D47EAP10 100 Abbott Park RoadAbbott Park Illinois 60064-3500

Received September 6 1995X

An evaluation of a variety of structure-based clustering methods for use in compound selection is presentedThe use of MACCS Unity and Daylight 2D descriptors Unity 3D rigid and flexible descriptors and twoin-house 3D descriptors based on potential pharmacophore points are considered The use of Wardrsquos andgroup-average hierarchical agglomerative Guenoche hierarchical divisive and Jarvis- Patrick nonhierarchicalclustering methods are compared The results suggest that 2D descriptors and hierarchical clustering methodsare best at separating biologically active molecules from inactives a prerequisite for a good compoundselection method In particular the combination of MACCS descriptors and Wardrsquos clustering was optimal

INTRODUCTION

The advent of high-throughput biological screening meth-ods have given pharmaceutical companies the ability toscreen many thousands of compounds in a short timeHowever there are many hundreds of thousands of com-pounds available both in-house and from commercial ven-dors Whilst it may be feasible to screen many or all of thecompounds available this is undesirable for reasons of costand time and may be unnecessary if it results in theproduction of some redundant information Therefore therehas been a great deal of interest in the use of compoundclustering techniques to aid in the selection of a representativesubset of all the compounds available1- 8 A similar problemfaces those interested in designing compounds for synthesisa good design will capture all the required information inthe minimum number of compounds9

Underpinning the compound selection methods is thesimilar property principle10 which states that structurallysimilar molecules will exhibit similar physicochemical andbiological properties Given a clustering method that cangroup structurally similar compounds together applicationof this principle implies that the selection or synthesis andtesting of representatives from each cluster produced froma set of compounds should be sufficient to understand thestructure- activity relationships of the whole set without theneed to test them allAn appropriate clustering method will ideally cluster all

similar compounds together whilst separating active andinactive compounds into different sets of clusters The firstfactor will ensure that every class of active compound isrepresented in the selected subset but that there is noredundancy The second factor will minimize the risk thatan inactive compound is selected as the representative of acluster containing one or more actives thereby missing aclass of active compoundsClustering is the process of dividing a set of entities into

subsets in which the members of each subset are similar toeach other but different from members of other subsets There

have been numerous cluster methods described generaldiscussions of many of these are given by Gordon11 byEverett12 and by Sneath and Sokal13 Several of thesemethods have be applied to clustering chemical structurescomprehensive reviews are given by Barnard and Downs14

and by Downs and Willett15 In outline the clusteringprocess for chemical structures is as follows(1) Select a set of attributes on which to base the

comparison of the structures These may be structuralfeatures andor physicochemical properties(2) Characterize every structure in the dataset in terms of

the attributes selected in step one(3) Calculate a coefficient of similarity dissimilarity or

distance between every pair of structures in the dataset basedon their attributes(4) Use a clustering method to group together similar

structures based on the coefficients calculated in step threeSome clustering methods may require the calculation ofsimilarity values between the new objects formed and theexisting objects(5) Analyze the resultant clusters or classification hierarchy

to determine which of the possible sets of clusters shouldbe chosenA number of methods are available both for the production

of descriptors in steps (1) and (2) and clusters in step (4)Whilst there are also a large number of coefficients that mightbe used in step (3) the choice of clustering method maydetermine which is best suitedIn this paper we present a study aimed at identifying the

most suitable descriptors and clustering methods for use incompound selection We have used a variety of methods tocluster sets of structures with known biological activities andevaluated the clusters produced according to their ability toseparate active and inactive compounds into different setsof clusters We have concerned ourselves with structurebased clustering For this the substructure search screensused in commercial database searching software have oftenbeen used as descriptors We have examined a number ofthese descriptors together with two developed in-house andhave considered the use of four commercially availableclustering methods

dagger brownrabbottcomDagger yvonnemartinabbottcomX Abstract published in AdVance ACS Abstracts January 15 1996

572 J Chem Inf Comput Sci 1996 36 572- 584

0095-2338961636-0572$12000 copy 1996 American Chemical Society

Cl

NN

CH3O

O

NH

NN

H3C O

O

HN

Cl

1 42

Encoding structural information

Vectors

I Embedding in an Euclidean space

All machine learning methods available

Loss of structural information

NN

C

O

C

2 42

Graph definition

A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V

If (vi vj) isin E then Vi is adjacent to Vj

3 42

Labelling function

Node labelling

I lv V rarr LVI LV Node label alphabet (symbolic real valued )

Edge labelling

I le V times V rarr LEI LE Edge label alphabet

4 42

Some graphs

Chemical science

NN

C

O

C

Symbolic labels atoms

Pattern Recognition

Vector labels Shape characteristics

5 42

Why use graphs

Graphs can handle structural informationBUTG 6= RN

We need to define a dissimilarity measure

6 42

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 4: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Encoding structural information

Vectors

I Embedding in an Euclidean space

All machine learning methods available

Loss of structural information

NN

C

O

C

2 42

Graph definition

A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V

If (vi vj) isin E then Vi is adjacent to Vj

3 42

Labelling function

Node labelling

I lv V rarr LVI LV Node label alphabet (symbolic real valued )

Edge labelling

I le V times V rarr LEI LE Edge label alphabet

4 42

Some graphs

Chemical science

NN

C

O

C

Symbolic labels atoms

Pattern Recognition

Vector labels Shape characteristics

5 42

Why use graphs

Graphs can handle structural informationBUTG 6= RN

We need to define a dissimilarity measure

6 42

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 5: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Graph definition

A graph G isin GG = (V E) is a set of nodes V connected by a set ofedges E = V times V

If (vi vj) isin E then Vi is adjacent to Vj

3 42

Labelling function

Node labelling

I lv V rarr LVI LV Node label alphabet (symbolic real valued )

Edge labelling

I le V times V rarr LEI LE Edge label alphabet

4 42

Some graphs

Chemical science

NN

C

O

C

Symbolic labels atoms

Pattern Recognition

Vector labels Shape characteristics

5 42

Why use graphs

Graphs can handle structural informationBUTG 6= RN

We need to define a dissimilarity measure

6 42

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 6: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Labelling function

Node labelling

I lv V rarr LVI LV Node label alphabet (symbolic real valued )

Edge labelling

I le V times V rarr LEI LE Edge label alphabet

4 42

Some graphs

Chemical science

NN

C

O

C

Symbolic labels atoms

Pattern Recognition

Vector labels Shape characteristics

5 42

Why use graphs

Graphs can handle structural informationBUTG 6= RN

We need to define a dissimilarity measure

6 42

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 7: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Some graphs

Chemical science

NN

C

O

C

Symbolic labels atoms

Pattern Recognition

Vector labels Shape characteristics

5 42

Why use graphs

Graphs can handle structural informationBUTG 6= RN

We need to define a dissimilarity measure

6 42

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 8: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Why use graphs

Graphs can handle structural informationBUTG 6= RN

We need to define a dissimilarity measure

6 42

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 9: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Graph Edit Distance (GED)

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

D

N

N

D

CC

C

CC

C

I Dissimilarity measure

I Quantify a distortion

7 42

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 10: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Edit path

Graph edit distanceMinimal amount of distortion required to transform one graph into another

I Edit path γ Sequence of edit operations e

γ = e1 ep

I Elementary edit cost c(e)

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

8 42

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 11: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Formal Definition

Edit path cost γ

c(γ) =sumeisinγ

c(e)

Graph edit distance

I Edit Distance ged(G1G2) = min

γisinP

sumc(γ)

I Optimal edit path

γlowast isin arg minγisinP

sumc(γ)

9 42

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 12: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

How to compute

Graph Edit Distance

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 13: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Tree search

11 42

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 14: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Tree search methods

Alowast

I Dijkstra-based algorithm

I Need an heuristic h(p)

Always find a solution

But may take a loooooong time

Exponential number of edit paths

Depth first search based algorithm[Abu-Aisheh et al 2015]

I Based on heuristic

I Limitation on the number of open paths

I Any time algorithm can return an approximation before termination

I Parallelizable

12 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 15: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Correspondance between edit paths and mappings[Bougleux et al 2015]

D

N

N

D

CC

C

CC

C

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

N

N

CC

CC

C

C

C

O DN

D

N

13 42

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 16: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Correspondance between edit paths and mappings[Bougleux et al 2015]

C

O DN

D

N

D

N

N

D

CC

C

CC

C

O

C N

N

CC

CC

C

C

13 42

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 17: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Computing graph edit distance

is equivalent to

finding an optimal assignment

between nodes

14 42

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 18: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Cost

Mapping ϕ V1 cup εrarr V2 cup ε

S(Vε1Vε2 ϕ) = Qe(Vε1Vε2 ϕ)︸ ︷︷ ︸Edgersquos Cost

+ Lv(Vε1Vε2 ϕ)︸ ︷︷ ︸Nodersquos Cost

15 42

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 19: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Cost Matrix C

Lv (V ε1 V

ε2 ϕ) =

sumvisinV1

c(v ϕ(v))

︸ ︷︷ ︸node substitutions

+sum

visinV1V1

c(v ε)

︸ ︷︷ ︸node removals

+sum

visinV2V2

c(ε v)

︸ ︷︷ ︸node insertions

c(v(1)1 rarr v

(2)1 ) middot middot middot c(v

(1)1 rarr v

(2)m ) c(v

(1)1 rarr ε) infin middot middot middot infin

infin

c(v(1)i rarr v

(2)j )

c(v

(1)i rarr ε)

c(v(1)n rarr v

(2)1 ) c(v

(1)n rarr v

(2)m ) infin c(v

(1)n rarr ε)

c(εrarr v(2)1 ) infin middot middot middot infin 0 middot middot middot 0

infin

c(εrarr v(2)j )

infin c(εrarr v(2)m ) 0 0

16 42

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 20: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Node operations cost

Node cost induced by ϕ

Lv (V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi=1

|G1|+|G2|sumj=1

(Xϕ C)(i j)

Vectorized version

Lv (V ε1 V

ε2 ϕ) = cgtxϕ

17 42

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 21: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Computation of edge costs

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 22: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Edges operations cost I

Edge cost matrix D

D(i j k l) = c((i j)rarr (k l))

I (i j) isin E1 rarr deletion operation

I (k l) isin E2 rarr insertion operation

I Edgersquos mapping is induced by nodes mapping

19 42

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 23: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Edges operations cost II

Edgersquos cost

Qe(V ε1 V

ε2 ϕ) =

|G1|+|G2|sumi j k l=1

Xϕ(i k)D(i j k l)Xϕ(j l)

Vectorized version

Qe(V ε1 V

ε2 ϕ) = xgtϕDxϕ

20 42

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 24: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

QAP formulation of GED

S(x) =1

2xgtDx︸ ︷︷ ︸

Edgersquos cost

+ cgtx︸︷︷︸Nodersquos cost

S(x) =1

2xgt∆x

xlowast = arg minxisinΠ

S(x)

21 42

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 25: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

An intractable problem

No garanties on ∆

rArr Non convex problem

rArr No polynomial solution of minxisinΠ

S(x)

I NP-Hard problem

Letrsquos look for an approximation

22 42

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 26: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Approximation is overestimation

1 Any mapping corresponds to an edit path

2 Any edit path has a cost ge GED

3 Approximate mappinghArr Overestimation of GED

23 42

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 27: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

A First Approach

Linear approximation

xlowast = minxisinΠ

1

2xgtDx + cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 28: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 29: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

A First Approach

Linear approximation

xlowast = arg minxisinΠ

cgtx

I [Riesen and Bunke 2009 Gauzere et al 2014 Carletti et al 2015]

I Linear approximation of QAP formulation

I Hungarian algorithm (O(n3))

No structural information

24 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 30: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 31: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 32: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Augmented Cost Matrix

Adding some stuctural information

I Cij cost of mapping neighboorhood of vi to neighboorhood of v primejI Direct neighbourhoodI Random walksI Subgraphs

I Complexity harr accuracy

N

N

N

N

25 42

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 33: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

A first QAP approach

xlowast = minxisinΠ

1

2xgtDx + cgtx

Gradient descent approach

I Letrsquos find a local minimum of S(x)

I Relax problem to continuous domain S

Some solvers exist

No consideration of discrete nature of solution

26 42

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 34: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Another strategy

Integer-Projected Fixed Point [Leordeanu et al 2009]

Franck-Wolfe like algorithm

Iterate until convergence

1 Discrete resolution of linear gradient of QAP rarr xt

2 Line search between xtminus1 and the solution found in step 1

27 42

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 35: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Operating IPFP

At convergence

I Optimal continuous solution x is stable

I Need of a projection step to embed x to Π

Uncontrolled loss

No garanties that S(x) asymp S(xprime)

Importance of initialization

I Local minimum

Initialization is important [Carletti et al 2015]

28 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 36: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xTx) + (1minus |ζ|)S(x)

ζ = 1 Convex objective function

ζ = minus1 Concave objective function

29 42

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 37: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

GNCCP approach [Liu and Qiao 2014]

From convex to concave objective function

S(x ζ) = ζ(xgtx) + (1minus |ζ|)S(x)

GNCCP algorithm

x = 0For ζ = 1rarr minus1

1 xlarr arg minxisinΠprime

S(x ζ)

2 ζ larr ζ minus 01

I Iterates ζ over a modified IPFP objective function

30 42

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 38: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

From ζ = 1 to ζ = 0

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

minus1 minus05 0 05 1minus1

minus05

0

05

1

31 42

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 39: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

From ζ = 0 to ζ = minus1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

minus1 minus05 0 05 1

minus1

minus05

0

05

1

32 42

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 40: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

GNCCP vs IPFP

Pros

No more need of initialization

Converge towards a mappingmatrix

Cons

Complexity Iterate over IPFP

33 42

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 41: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

GREYC chemistry datasets[httpsiapr-tc15greycfrlinkshtml]

C

C

C

CC C

CS

S

Alkane Acyclic

NN

C

O

C

C

C

MAO PAH

34 42

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 42: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Protocol

Arbitrary costs

I All substitutions 1

I All insertionsdeletions 3

Relative error

I Accuracy measure

I Overestimation The lowest approximation is the best one (dopt)

I Relative error d(Gi Gj)minus dopt(Gi Gj)

dopt(Gi Gj)

35 42

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 43: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Relative errors

Alkane Acyclic MAO PAH0

50

100

150

200

250

Datasets

o

f re

lative e

rror

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

36 42

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 44: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

log Time vs Score Deviation

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 10

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Alkane

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

Acyclic

A

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

MAO

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

minus7 minus6 minus5 minus4 minus3 minus2 minus1 0 1 20

01

02

03

04

05

06

07

08

09

1

log10 of time in seconds

Score

devia

tion

PAH

LSAP Riesen

LSAP rw

LSAP Kminusgraphs

IPFP random

IPFP rw

GNCCP

37 42

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 45: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Analysis

Tradeoff

I Accuracy is dependant to complexity

I Choose your method according to your priority

More complete analysis

I ICPR GED contest httpsgdc2016greycfr

I Others methods + Others datasets

38 42

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 46: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

GED Limitations

Mathematical properties

I GED is a distance

I But not an euclidean one

Impossible to derive a trivial kernel

Use with caution in SVMs

Complexity

I Still hard to compute on larger graphs

I Accuracy hard to evaluate (lack of gt)

39 42

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 47: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Outlooks

Algorithms

I Matrix optimization

I Edge based mapping

I Optimization algorithms

40 42

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 48: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Outlooks

Applications

I Explore graph space trough GEDI Median graph joint work with Paul

I Use of ged for classification

I How to set costs according to a task I rarr Metric learningI New Phd with Sebastien and Pierre

I Behavior of different methods

41 42

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 49: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Conclusion

I GED is still an open problem

I Approximation algorithms exists

I but it stills room for improvements

I Focus on applications

42 42

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 50: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Abu-Aisheh Z Raveaux R Ramel J-Y and Martineau P(2015)An Exact Graph Edit Distance Algorithm for Solving PatternRecognition ProblemsIn 4th International Conference on Pattern Recognition Applicationsand Methods

Bougleux S Brun L Carletti V Foggia P Gauzere B andVento M (2015)A quadratic assignment formulation of the graph edit distanceTechnical report Normandie Univ NormaSTIC FR 3638 France

Carletti V Gauzere B Brun L and Vento M (2015)Approximate graph edit distance computation combining bipartitematching and exact neighborhood substructure distanceIn GbRPR volume 9069 pages 168ndash177 Springer

Gauzere B Bougleux S Riesen K and Brun L (2014)Approximate Graph Edit Distance Guided by Bipartite Matching ofBags of WalksIn Structural Syntactic and Statistical Pattern Recognition volume8621 pages 73ndash82 Springer

42 42

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42

Page 51: Graph Edit Distancepagesperso.litislab.fr/~bgauzere/slides_litisiades.pdf · jG 1Xj+jG 2j i=1 jG 1Xj+jG 2j j=1 (X ’ C)(i;j) Vectorized version L v(V" 1;V " 2;’) = c >x ’ 17/42.

Leordeanu M Hebert M and Sukthankar R (2009)An integer projected fixed point method for graph matching andmap inferenceIn Advances in neural information processing systems pages1114ndash1122

Liu Z-Y and Qiao H (2014)GNCCPndashGraduated NonConvexity and Concavity ProcedurePattern Anal Mach Intell 36(6) 1258ndash1267

Riesen K and Bunke H (2009)Approximate graph edit distance computation by means of bipartitegraph matchingImage and Vision Comp 27 950ndash959

42 42