A quantum Jensen–Shannon graph kernel for unattributed graphs

12
A quantum JensenShannon graph kernel for unattributed graphs Lu Bai a,n , Luca Rossi b,nn , Andrea Torsello c , Edwin R. Hancock a,1 a Department of Computer Science, The University of York, York YO10 5DD, UK b School of Computer Science, University of Birmingham, UK c Department of Environmental Science, Informatics, and Statistics, CaFoscari University of Venice, Italy article info Article history: Received 8 September 2013 Received in revised form 20 March 2014 Accepted 21 March 2014 Keywords: Graph kernels Continuous-time quantum walk Quantum state Quantum JensenShannon divergence abstract In this paper, we use the quantum JensenShannon divergence as a means of measuring the information theoretic dissimilarity of graphs and thus develop a novel graph kernel. In quantum mechanics, the quantum JensenShannon divergence can be used to measure the dissimilarity of quantum systems specied in terms of their density matrices. We commence by computing the density matrix associated with a continuous-time quantum walk over each graph being compared. In particular, we adopt the closed form solution of the density matrix introduced in Rossi et al. (2013) [27,28] to reduce the computational complexity and to avoid the cumbersome task of simulating the quantum walk evolution explicitly. Next, we compare the mixed states represented by the density matrices using the quantum JensenShannon divergence. With the quantum states for a pair of graphs described by their density matrices to hand, the quantum graph kernel between the pair of graphs is dened using the quantum JensenShannon divergence between the graph density matrices. We evaluate the performance of our kernel on several standard graph datasets from both bioinformatics and computer vision. The experi- mental results demonstrate the effectiveness of the proposed quantum graph kernel. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction Structural representations have been used for over 30 years in pattern recognition due to their representational power. However, the increased descriptiveness comes at the cost of a greater difculty in applying standard techniques to them, as these usually require data to reside in a vector space. The famous kernel trick [1] allows the focus to be shifted from the vectorial representation of data, which now becomes implicit, to a similarity representation. This allows standard learning techniques to be applied to data for which no obvious vectorial representation exists. For this reason, in recent years pattern recognition has witnessed an increasing interest in structural learning using graph kernels. 1.1. Literature review 1.1.1. Graph kernels One of the most inuential works on structural kernels was the generic R-convolution kernel proposed by Haussler [2]. Here graph kernels are computed by comparing the similarity of each of the decompositions of the two graphs. Depending on how the graphs are decomposed, we obtain different kernels. Generally speaking, most R-convolution kernels count the number of isomorphic substructures in the two graphs. Kashima et al. [3] compute the kernel by decomposing the graph into random walks, while Borgwardt et al. [4] have proposed a kernel based on shortest paths. Here, the similarity is determined by counting the numbers of pairs of shortest paths of the same length in a pair of graphs. Shervashidze et al. [5] have developed a subtree kernel on subtrees of limited size. They compute the number of subtrees shared between two graphs using the WeisfeilerLehman graph invariant. Aziz et al. [6] have dened a backtrackless kernel on cycles of limited length. They compute the kernel value by counting the numbers of pairs of cycles of the same length in a pair of graphs. Costa and Grave [7] have dened a so-called neighborhood subgraph pairwise distance kernel by counting the number of pairs of isomorphic neighborhood subgraphs. Recently, Kriege et al. [8] counted the number of isomorphisms between pairs of subgraphs, while Neumann et al. [9] have introduced the concept of propagation kernels to handle partially labeled graphs through the use of continuous-valued vertex attributes. One drawback of these kernels is that they neglect the loca- tional information for the substructures in a graph. In other words, the similarity does not depend on the relationships between substructures. As a consequence, these kernels cannot establish Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition http://dx.doi.org/10.1016/j.patcog.2014.03.028 0031-3203/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. Tel.: þ86 13084142051, þ44 7429399030. nn Corresponding author. Tel.: þ44 1214144766. E-mail addresses: [email protected], [email protected] (L. Bai), [email protected], [email protected] (L. Rossi), [email protected] (A. Torsello), [email protected] (E.R. Hancock). 1 He was supported by a Royal Society Wolfson Research Merit Award. Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014), http://dx.doi.org/10.1016/j.patcog.2014.03.028i Pattern Recognition (∎∎∎∎) ∎∎∎∎∎∎

Transcript of A quantum Jensen–Shannon graph kernel for unattributed graphs

  • A quantum JensenShannon graph kernel for unattributed graphs

    Lu Bai a,n, Luca Rossi b,nn, Andrea Torsello c, Edwin R. Hancock a,1

    a Department of Computer Science, The University of York, York YO10 5DD, UKb School of Computer Science, University of Birmingham, UKc Department of Environmental Science, Informatics, and Statistics, Ca Foscari University of Venice, Italy

    a r t i c l e i n f o

    Article history:Received 8 September 2013Received in revised form20 March 2014Accepted 21 March 2014

    Keywords:Graph kernelsContinuous-time quantum walkQuantum stateQuantum JensenShannon divergence

    a b s t r a c t

    In this paper, we use the quantum JensenShannon divergence as a means of measuring the informationtheoretic dissimilarity of graphs and thus develop a novel graph kernel. In quantum mechanics, thequantum JensenShannon divergence can be used to measure the dissimilarity of quantum systemsspecied in terms of their density matrices. We commence by computing the density matrix associatedwith a continuous-time quantum walk over each graph being compared. In particular, we adopt theclosed form solution of the density matrix introduced in Rossi et al. (2013) [27,28] to reduce thecomputational complexity and to avoid the cumbersome task of simulating the quantum walk evolutionexplicitly. Next, we compare the mixed states represented by the density matrices using the quantumJensenShannon divergence. With the quantum states for a pair of graphs described by their densitymatrices to hand, the quantum graph kernel between the pair of graphs is dened using the quantumJensenShannon divergence between the graph density matrices. We evaluate the performance of ourkernel on several standard graph datasets from both bioinformatics and computer vision. The experi-mental results demonstrate the effectiveness of the proposed quantum graph kernel.

    & 2014 Elsevier Ltd. All rights reserved.

    1. Introduction

    Structural representations have been used for over 30 years inpattern recognition due to their representational power. However,the increased descriptiveness comes at the cost of a greaterdifculty in applying standard techniques to them, as these usuallyrequire data to reside in a vector space. The famous kernel trick [1]allows the focus to be shifted from the vectorial representation ofdata, which now becomes implicit, to a similarity representation.This allows standard learning techniques to be applied to data forwhich no obvious vectorial representation exists. For this reason,in recent years pattern recognition has witnessed an increasinginterest in structural learning using graph kernels.

    1.1. Literature review

    1.1.1. Graph kernelsOne of the most inuential works on structural kernels was the

    generic R-convolution kernel proposed by Haussler [2]. Here graph

    kernels are computed by comparing the similarity of each of thedecompositions of the two graphs. Depending on how the graphsare decomposed, we obtain different kernels. Generally speaking,most R-convolution kernels count the number of isomorphicsubstructures in the two graphs. Kashima et al. [3] compute thekernel by decomposing the graph into random walks, whileBorgwardt et al. [4] have proposed a kernel based on shortestpaths. Here, the similarity is determined by counting the numbersof pairs of shortest paths of the same length in a pair of graphs.Shervashidze et al. [5] have developed a subtree kernel onsubtrees of limited size. They compute the number of subtreesshared between two graphs using the WeisfeilerLehman graphinvariant. Aziz et al. [6] have dened a backtrackless kernel oncycles of limited length. They compute the kernel value bycounting the numbers of pairs of cycles of the same length in apair of graphs. Costa and Grave [7] have dened a so-calledneighborhood subgraph pairwise distance kernel by counting thenumber of pairs of isomorphic neighborhood subgraphs. Recently,Kriege et al. [8] counted the number of isomorphisms betweenpairs of subgraphs, while Neumann et al. [9] have introduced theconcept of propagation kernels to handle partially labeled graphsthrough the use of continuous-valued vertex attributes.

    One drawback of these kernels is that they neglect the loca-tional information for the substructures in a graph. In other words,the similarity does not depend on the relationships betweensubstructures. As a consequence, these kernels cannot establish

    Contents lists available at ScienceDirect

    journal homepage: www.elsevier.com/locate/pr

    Pattern Recognition

    http://dx.doi.org/10.1016/j.patcog.2014.03.0280031-3203/& 2014 Elsevier Ltd. All rights reserved.

    n Corresponding author. Tel.: 86 13084142051, 44 7429399030.nn Corresponding author. Tel.: 44 1214144766.E-mail addresses: [email protected], [email protected] (L. Bai),

    [email protected], [email protected] (L. Rossi),[email protected] (A. Torsello), [email protected] (E.R. Hancock).

    1 He was supported by a Royal Society Wolfson Research Merit Award.

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

    Pattern Recognition ()

  • reliable structural correspondences between the substructures.This limits the precision of the resulting similarity measure. Toovercome this problem, Frohlich et al. [10] introduced alternativeoptimal assignment kernels. Here each pair of structures is alignedbefore comparison. However, the introduction of the alignmentstep results in a kernel that is not positive denite in general [11].The problem results from the fact that alignments are not ingeneral transitive. In other words, if s is the vertex-alignmentbetween graph A and graph B, and is the alignment betweengraph B and graph C, in general, we cannot guarantee that thealignment between graph A and graph C is s. On the other hand,when the alignments are transitive, there is a common simulta-neous alignment of all the graphs. Under this alignment, theoptimal assignment kernel is simply the sum over all the vertex/edge kernels, and this is positive denite since it is the sum ofseparate positive denite kernels. While lacking positive denite-ness the optimal assignment kernels cannot be guaranteed torepresent an implicit embedding into a Hilbert space, they havenonetheless been proven to be very effective in classifying struc-tures. Another example of alignment-based kernels is the edit-distance-based kernels introduced by Neuhaus and Bunke [12].Here the alignments obtained from graph-edit distance are used toguide random walks on the structures being compared.

    An attractive alternative way to measure the similarity of a pair ofgraphs is to use the mutual information and compute the classicalJensenShannon divergence [13]. In information theory, the JensenShannon divergence is a dissimilarity measure between probabilitydistributions. It is symmetric, always well dened and bounded [13].Bai and Hancock [14] have used the divergence to dene a JensenShannon kernel for graphs. Here, the kernel between a pair of graphsis computed using a nonextensive entropy measure in terms of theclassical JensenShannon divergence between probability distribu-tions over graphs. For a graph, the elements of the probabilitydistribution are computed from the degree of corresponding vertices.The entropy associated with a probability distribution of an individualgraph can thus be directly computed without the need of decompos-ing the graph into substructures. Hence, unlike the aforementionedexisting graph kernels, the JensenShannon kernel between a pair ofgraphs avoids the computationally burdensome task of determiningthe similarities between all pairs of substructures. Unfortunately, therequired composite entropy for the JensenShannon kernel is com-puted from a product graph formed by a pair of graphs, and reectsno correspondence information between pairs of vertices. As a result,the JensenShannon graph kernel lacks correspondence informationbetween the probability distributions over the graphs, and thuscannot precisely reect the similarity between graphs.

    There has recently been an increasing interest in quantumcomputing because of the potential speed-ups over classicalalgorithms. Examples include Grover's polynomially faster searchalgorithm [15] and Shor's exponentially faster factorization algo-rithm [16]. Furthermore, quantum algorithms also offer us a richerstructure than their classical counterparts since they use qubitsrather than bits as the basic representational unit [32].

    1.1.2. Quantum computationQuantum systems differ from their classical counterparts, since

    they add the possibility of state entanglement to the classicalstatistical mixture of classical systems, which results in an expo-nential increase of the dimensionality of the state-space which isat the basis of the quantum speedup. Pure quantum states arerepresented as entries in a complex Hilbert space, while poten-tially mixed quantum states are represented through the densitymatrix. Mixed states can then be compared by examining theirdensity matrices. One convenient way to do this is to use thequantum JensenShannon divergence, rst introduced by Majtey

    et al. [13,18]. Unlike the classical JensenShannon divergencewhich is dened as a similarity measure between probabilitydistributions, the quantum JensenShannon divergence is denedas the distance measure between mixed quantum states describedby density matrices. Moreover, it can be used to measure both thedegree of mixing and the entanglement [13,18].

    In this paper, we are interested in computing a kernel betweenpairs of graphs using the quantum JensenShannon divergencebetween two mixed states representing the evolution ofcontinuous-time quantum walks on the graphs. The continuous-time quantum walk has been introduced as the natural quantumanalogue of the classical random walk by Farhi and Gutmann [19],and has been widely used in the design of novel quantumalgorithms. Ambainis et al. [20,21] were among the rst togeneralize random walks on nite graphs to the quantum world.Furthermore, they have explored the application of quantumwalks on graphs to a number of problems [22]. Childs et al.[23,24] have explored the difference between quantum andclassical random walks, and then exploited the increased powerof quantum walk as a general model of computation.

    Similar to the classical randomwalk on a graph, the state space ofthe continuous-time quantumwalk is the set of vertices of the graph.However, unlike the classical random walk where the state vector isreal-valued and the evolution is governed by a doubly stochasticmatrix, the state vector of the continuous-time quantum walk iscomplex-valued and its evolution is governed by a time-varyingunitary matrix. The continuous-time quantum walk possesses anumber of interesting properties which are not exhibited by theclassical random walk. For instance, the continuous-time quantumwalk is reversible and non-ergodic, and does not have a limitingdistribution. As a result, the continuous-time quantumwalk offers usan elegant way to design quantum algorithms on graphs. For furtherdetails on quantum computation and quantum algorithms, we referthe reader to the textbook in [32].

    There have recently been several attempts to dene quantumkernels using the continuous-time quantumwalk. For instance, Baiet al. [26] have introduced a novel graph kernel where thesimilarity between two graphs is dened in terms of the similaritybetween two quantum walks evolving on the two graphs. Thebasic idea here is to associate with each graph a mixed quantumstate representing the time evolution of a quantum walk. Thekernel between the walk is dened as the divergence between thecorresponding density operators. However, this quantum diver-gence measure requires the computation of an additional mixedstate where the system has equal probability of being in each ofthe two original quantum states. Unless this quantum kernel takesinto account the correspondences between the vertices of the twographs, it can be shown that it is not permutation invariant. Rossiet al. [27,28] have attempted to overcome this problem byallowing the two quantum walks to evolve on the union of thetwo graphs. This exploits the relation between the interferences ofthe continuous-time quantum walks and the symmetries in thestructures being compared. Since both the mixed states aredened on the same structure, this quantum kernel addressesthe shortcoming of permutation variance. However, the resultingapproach is less intuitive, thus making it particularly hard to provethe positive semideniteness of the kernel. Moreover, the uniongraph is established by roughly connecting all vertex pairs of thetwo graphs, and thus also lacks vertex correspondence informa-tion. As a result, this kernel cannot reect precise similaritybetween the two graphs.

    1.2. Contributions

    The aim of this paper is to overcome the shortcomings of theexisting graph kernels by dening a novel quantum kernel. To this

    L. Bai et al. / Pattern Recognition () 2

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • end, we develop the work in [26] further. We intend to solve theproblems of permutation variance and missing vertex correspon-dence by introducing an additional alignment step, and thuscompute an aligned mixed density matrix. The goal of thealignment is to provide vertex correspondences and a lower boundon the divergence over permutations of the vertices/quantumstates. However we approximate the alignment that minimizesthe divergence with that which minimizes the Frobenious normbetween the density operators. This latter problem is solved usingUmeyama's graph matching method [29]. Since the aligneddensity matrix is computed by taking into account the vertexcorrespondences between the two graphs, the density matrixreects the locational correspondences between the quantumwalks over the vertices of the two graphs. As a result, our newquantum kernel can not only overcome the shortcomings ofmissing vertex correspondence and permutation variance thatarise with the existing kernels from the classical/quantum Jen-senShannon divergence, but also overcome the shortcoming ofneglecting locational correspondence between substructures thatarises in the existing R-convolution kernels. Furthermore, incontrast to what is done in previous assignment-based kernels,we do not align the structures directly. Rather we align the densitymatrices, which are a special case of a Laplacian family signature[30] well known to provide a more robust representation forcorrespondence estimation. Finally, we adopt the closed formsolution of the density matrix introduced in [27,28] to reducethe computational complexity and to avoid the cumbersome taskof explicitly simulating the time evolution of the quantum walk.We evaluate the performance of our kernel on several standardgraph datasets from bioinformatics and computer vision. Theexperimental results demonstrate the effectiveness of the pro-posed graph kernel, providing both a higher classication accuracythan the unaligned kernel and also achieving a performancecomparable or superior to other state-of-the-art graph kernels.

    1.3. Paper outline

    In Section 2, we introduce the quantum mechanical formalismthat will be used to develop the ideas proposed in the paper. Wedescribe how to compute a density matrix for the mixed quantumstate of the continuous-time quantum walk on a graph. With themixed state density matrix to hand, we compute the von Neumannentropy of the continuous-time quantum walk on the graph.In Section 3, we compute a graph kernel using the quantumJensenShannon divergence between the density matrices for apair of graphs. In Section 4, we provide some experimentalevaluations, which demonstrate experimentally the effectivenessof our quantum kernel. In Section 5, we conclude the paper andsuggest future research directions.

    2. Quantum mechanical background

    In this section, we introduce the quantummechanical formalismto be used in this work. We commence by reviewing the concept ofa continuous-time quantumwalk on a graph. We then describe howto establish a density matrix for the mixed quantum state of thegraph through a quantum walk. With the density matrix of themixed quantum state to hand, we show how to compute a vonNeumann entropy for a graph.

    2.1. The continuous-time quantum walk

    The continuous-time quantum walk is a natural quantumanalogue of the classical random walk [19,31]. Classical randomwalks can be used to model a diffusion process on a graph, and

    have proven to be a useful tool in the analysis of graph structure.In general, diffusion is modeled as a Markovian process denedover the vertices of the graph, where the transitions take place onthe edges. More formally, let G V ; E be an undirected graph,where V is a set of n vertices and E V V is a set of edges. Thestate vector for the walk at time t is a probability distribution overV, i.e., a vector p!tARn whose uth entry gives the probability thatthe walk is at vertex u at time t. Let A denote the symmetricadjacency matrix of the undirected graph G and let D be thediagonal degree matrix with elements du nv 1Au; v, where duis the degree of the vertex u. Then, the continuous-time randomwalk on G will evolve according to the equation

    p!t eLt p!0; 1

    where LDA is the graph Laplacian.As in the classical case, the continuous-time quantum walk is

    dened as a dynamical process over the vertices of the graph.However, in contrast to the classical case where the state vector isconstrained to lie in a probability space, in the quantum case thestate of the system is dened through a vector of complexamplitudes over the vertex set V. The squared norm of theamplitudes sums to unity over the vertices of the graph, withno restriction on their sign or complex phase. This in turn allowsboth destructive and constructive interferences to take placebetween the complex amplitudes. Moreover, in the quantumcase, the evolution of the walk is governed by a complex valuedunitary matrix, whereas the dynamics of the classical randomwalks is governed by a stochastic matrix. Hence the evolution ofthe quantum walk is reversible, which implies that quantumwalks are non-ergodic and do not possess a limiting distribution.As a result, the behaviour of classical and quantum walks differssignicantly. In particular, the quantum walk possesses a numberof interesting properties not exhibited in the classical case. Onenotable consequence of the interference properties of quantumwalks is that when a quantum walker backtracks on an edge itdoes so with the opposite phase. This gives rise to destructiveinterference and reduces the problem of tottering. In fact thefaster hitting time observed in quantum walks on the line is aconsequence of the destructive interference of backtrackingpaths [31].

    The Dirac notation, also known as bra-ket notation, is astandard notation used for describing quantum states. We callpure state a quantum state that can be described using a single ketvector ja. In the context of the paper, a ket vector is a complexvalued column vector of unit Euclidean length, in an n-dimen-sional Hilbert space. Its conjugate transpose is a bra (row) vector,denoted as aj. As a result, the inner product between two statesja and jb is written as ajb, while their outer product is jabj.Using the Dirac notation, we denote the basis state correspondingto the walk being at vertex uAV as ju. In other words, ju is thevector which is equal to 1 in the position corresponding to thevertex u and is zero everywhere else. A general state of the walk isthen expressed as a complex linear combination of these ortho-normal basis states, such that the state vector of the walk at time tis dened as

    j t uAV

    utju; 2

    where the amplitude utAC is subject to the constraintsuAVutnut 1 for all uAV , tAR , where nut is the complexconjugate of ut.

    At each instant of time, the probability of the walk being ata particular vertex of the graph GV ; E is given by the squarednorm of the amplitude associated with j t. More formally, let Xtbe a random variable giving the location of the walk at time t.

    L. Bai et al. / Pattern Recognition () 3

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • The probability of the walk being at vertex uAV at time t isgiven by

    PrXt u utnut: 3More generally, a quantum walker represented by j t can be

    observed to be in any state jACn, and not just in the statesju;uAV corresponding to the graph vertices. Given an orthonor-mal basis j1;; jn of Cn, the quantumwalker j t is observed instate ji with probability ij i. After being observed on state ji,the new state of the quantum walker is ji. Further, given thenature of quantum observation, only orthonormal states can bedistinguished by a single observation.

    In quantum mechanics, the Schrodinger equation is a partialdifferential equation that describes how quantum systems evolvewith time. In the non-relativistic case it assumes the form

    it

    H ; 4where the Hamiltonian operatorH accounts for the total energy of asystem. In the case of a particle moving in an empty space with zeropotential energy, this reduces to the Laplacian operator 2. Here, wetake the Hamiltonian of the system to be the graph Laplacian matrixL. However, any Hermitian operator encoding the structure of thegraph can be chosen as an alternative. The evolution of the walk attime t is then given by the Schrodinger equation

    t

    t iL t; 5

    where the Laplacian L plays the role of the system Hamiltonian.Given an initial state j0, the Schrodinger equation has anexponential solution which can be used to determine the stateof the walk at a time, t, giving

    j t e iLt j0; 6where Ut e iLt is a unitary matrix governing the evolution of thewalk. To simulate the evolution of the quantum walk, it iscustomary to rewrite Eq. (6) in terms of the spectral decomposi-tion LT of the Laplacian matrix L, where diag1; 2;; jV j is a diagonal matrix with the ordered eigenvalues aselements 1o2oojV j and 1j2jjjV j is a matrixwith the corresponding ordered orthonormal eigenvectors ascolumns. Hence, Eq. (6) can be rewritten as

    j tTe itj0: 7In this work, we let u0 be proportional to du and using the

    constraints applied to the amplitude we get

    u0 dudu

    s: 8

    Note that there is no particular indication as how to choose theinitial state. However, choosing a uniform distribution over thevertices of G would result in the system remaining stationary,since the initial state vector would be an eigenvector of theLaplacian of G and thus a stationary state of the walk.

    2.2. A density matrix from the mixed state

    While a pure state can be naturally described using a single ketvector, in general, a quantum system can be in a mixed state, i.e., astatistical ensemble of pure quantum states j i, each with prob-ability pi. The density operator (or density matrix) of such a systemis dened as

    ipij i ij: 9

    Density operators are positive unit-trace matrices directlylinked with the observables of the (mixed) quantum system.

    Let O be an observable, i.e., a Hermitian operator acting on thequantum states and providing a measurement. Without loss ofgenerality we have Omi 1viPi, where Pi is the orthogonalprojector onto the i-th observation basis, and vi is the measure-ment value when the quantum state is observed to be in thisobservation basis.

    The expectation value of the measurement over a mixed statecan be calculated from the density matrix :

    O trO; 10where tr is the trace operator. Similarly, the observation prob-ability on the i-th observation basis can be expressed in terms ofthe density matrix as

    PrX i trPi: 11Finally, after the measurement, the corresponding density opera-tor will be

    0 m

    i 1PiPi: 12

    For a graph GV ; E, let j t denote the state corresponding to acontinuous-time quantumwalk that has evolved from time t0 totime tT. We dene the time-averaged density matrix GT forGV ; E

    TG 1T

    Z T0

    t t dt; 13

    which can be expressed in terms of the initial state j0 as

    TG 1T

    Z T0

    e iLt 00 eiLt dt:

    14In other words, GT describes a system which has an equalprobability of being in any of the pure states dened by theevolution of the quantum walk from t0 to tT. Using thespectral decomposition of the Laplacian we can rewrite theprevious equation as

    TG 1T

    Z T0

    Te it 00 Teit dt:

    15Let xy denote the xy th element of the matrix of eigenvectors

    of the Laplacian. The (r,c)th element of T can be computed as

    TGr; c 1T

    Z T0

    n

    k 1n

    l 1rke

    iktlk0l

    !n

    x 1n

    y 10xxye

    iytcy

    !dt:

    16Let k llk0l and y xxy0y, where 0l denotes the lthelement of j0. Then

    TGr; c 1T

    Z T0

    n

    k 1rke

    ikt k n

    y 1cye

    iyt y

    !dt; 17

    which can be rewritten as

    TGr; c n

    k 1n

    y 1rkcy k y

    1T

    Z T0

    eiy kt dt: 18

    If we let T-1, Eq. (18) further simplies to1G r; c

    A ~

    xAB

    yB

    rxcy x y; 19

    where ~ is the set of distinct eigenvalues of the Laplacian matrix Land B is a basis of the eigenspace associated with . As aconsequence, computing the time-averaged density matrix relieson computing the eigendecomposition of GV ; E, and hence hastime complexity OjV j3. Note that in the remainder of this paperwe will simplify our notation by referring to the time-averageddensity matrix as the density matrix associated with a graph.

    L. Bai et al. / Pattern Recognition () 4

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • Moreover, we will assume that GT is computed for T-1, unlessotherwise stated, and we will denote it as G.

    2.3. The von Neumann entropy of a graph

    In quantum mechanics, the von Neumann entropy [32] HN of amixture is dened in terms of the trace and logarithm of thedensity operator

    HN tr log ii ln i; 20

    where 1;; n are the eigenvalues of . Note that if the quantumsystem is a pure state j i with probability pi1, then the vonNeumann entropy HN tr log is zero. On other hand, amixed state generally has a non-zero von Neumann entropyassociated with its density operator.

    Here we propose to associate to each graph the von Neumannentropy of the density matrix computed as in Eq. (19). Consider agraph GV ; E, its von Neumann entropy is

    HNG trG log G jV j

    jGj log

    Gj ; 21

    where G1 ;; Gj ;;

    GjV j are the eigenvalues of G.

    2.4. Quantum JensenShannon divergence

    The classical JensenShannon divergence is a non-extensivemutual information measure dened between probability distri-butions over potentially structured data. It is related to theShannon entropy of the two distributions [33]. Consider two(discrete) probability distributions P p1;; pa;; pA and Qq1;; qb;; qB, then the classical JensenShannon divergencebetween P and Q is dened as

    DJSP;Q HS PQ2

    12HSP12HSQ; 22

    where HSP Aa 1pa log pa is the Shannon entropy of distri-bution P. The classical JensenShannon divergence is alwayswell dened, symmetric, negative denite and bounded, i.e.,0rDJSr1.

    The quantum JensenShannon divergence has recently beendeveloped as a generalization of the classical JensenShannondivergence to quantum states by Lamberti et al. [13]. Given twodensity operators and s, the quantum JensenShannon diver-gence between them is dened as

    DQJS;s HNs2

    12HN

    12HNs: 23

    The quantum JensenShannon divergence is always well dened,symmetric, negative denite and bounded, i.e., 0rDQJSr1 [13].

    Additionally, the quantum JensenShannon divergence veriesa number of properties which are required for a good distinguish-ability measure between quantum states [13,18]. This is a problemof key importance in quantum computation, and it requires thedenition of a suitable distance measure. Wootters [34] was therst to introduce the concept of statistical distance between purequantum states. His work is based on extending a distance overthe space of probability distributions to the Hilbert space of purequantum states. Similarly, the relative entropy [35] can be seen asa generalization of the information theoretic KullbackLeiblerdivergence. However, the relative entropy is unbounded, it is notsymmetric and it does not satisfy the triangle inequality.

    The square root of the QJSD, on the other hand, is bounded, it isa distance and, as proved by Lamberti et al. [13], it satises thetriangle inequality. This is formally proved for pure states, whilefor mixed states there is only empirical evidence. Some alterna-tives to the QJSD are described in the literature, most notably the

    Bures distance [36]. Both the Bures distance and the QJSD requirethe full density matrices to be computed. However, the QJSD turnsout to be faster to compute. On the one hand, the Bures distanceinvolves taking the square root of matrices, usually computedthrough matrix diagonalization and which has time complexityOn3, where n is the number of vertices in the graph. On the otherhand, the QJSD simply requires computing the eigenvalues of thedensity matrices, which can be done in On2.

    3. A quantum JensenShannon graph kernel

    In this section, we propose a novel graph kernel based on thequantum JensenShannon divergence between continuous-timequantumwalks on different graphs. Suppose that the graphs underconsideration are represented by a set fG;;Ga;;Gb;;GNg, weconsider the continuous-time quantum walks on a pair of graphsGaVa; Ea and GbVb; Eb, whose mixed state density matrices aand b can be computed using Eq. (18) or Eq. (19). With the densitymatrices a and b to hand we compute the quantum JensenShannon divergence DQJSa; b using Eq. (23). However the resultdepends on the relative order of the vertices in the two graphs,and we need a way to describe the mixed quantum state given bythe walk on both graphs. To this end, we consider two strategies.We refer to these as the unaligned density matrix and the aligneddensity matrix. In the rst case the density matrices are addedmaintaining the vertex order of the data. This results in theunaligned kernel

    kQJSUGa;Gb expDQJSa; b; 24

    where is a decay factor in the interval 0or1,

    DQJSa; b HNab

    2

    12HNa

    12HNb 25

    is the quantum Jensen Shannon divergence between the mixedstate density matrices a and b of Ga and Gb, and HN: is the vonNeumann entropy of a graph density matrix dened in Eq. (21).Here is used to ensure that the large value does not tend todominant the kernel value, and in this work we use 1. Theproposed kernel can be viewed as a similarity measure betweenthe quantum states associated with a pair of graphs. Unfortunately,since the unaligned density matrix ignores the vertex correspon-dence information for a pair of graphs, it is not permutationinvariant to vertex order. Instead it relies only on the globalproperties and long-range interactions between vertices to differ-entiate between structures.

    In the second case the mixing is performed after the densitymatrices are optimally aligned, thus computing a lower bound ofthe divergence over the set of state permutations. This results inthe aligned kernel

    kQJSAGa;Gb maxQ A

    exp DQJS a;QbQT

    exp minQ A

    DQJS a;QbQT

    : 26

    Here the alignment step adds correspondence information allow-ing for a more ne-grained discrimination between structures.

    In both cases the density matrix of smaller size is padded to thesize of the larger one before computing the divergence. Note thatthis is equivalent to padding the adjacency matrix of the smallergraph to be the same size as the larger one, since both theseoperations are simply increasing the dimension of the kernel ofthe density matrix.

    L. Bai et al. / Pattern Recognition () 5

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • 3.1. Alignment of the density matrix

    The denition of the aligned kernel requires the computationof the permutation that minimizes the divergence between thegraphs. This is equivalent to minimizing the von Neumann entropyof the mixed state ab=2. However, this is a computationallyhard task, and so we perform a two step approximation to theproblem. First, we approximate the solution by minimizing theFrobenious (L2) distance between matrices rather than the vonNeumann entropy. Second we nd an approximate solution to theL2 problem using Umeyama's method [29].

    The rationale behind this approximate method is given by thefact that we already have the eigenvectors and eigenvalues of theHamiltonian to hand, and with these we can efciently computethe eigendecomposition of the density matrices [37] (which isrequired in order to apply Umeyama's spectral approach). Moreprecisely, the authors of [37] show that

    La A ~

    PPT

    !A ~

    P0PT

    !

    A ~

    P0PT

    A ~

    P0PT

    !A ~

    PPT

    ! aL; 27

    where 0 denotes the density matrix for G at time t0 andP k 1;kT;k is the projection operator onto the subspacespanned by the eigenvectors ;k associated with the eigen-value of the Hamiltonian. In other words, given the spectraldecomposition of the Laplacian matrix LT , if we express thedensity matrix in the eigenvector basis given by , then it assumesa block diagonal form, where each block corresponds to aneigenspace of L corresponding to a single eigenvalue. As aconsequence, if L has all eigenvalues distinct, then a will havethe same spectrum as L. Otherwise, we need to independentlycompute the eigendecomposition of each diagonal block, resultingin a complexity OA ~2, where denotes the multiplicityof the eigenvalue .

    3.2. Properties of the quantum JensenShannon kernel for graphs

    Theorem 3.1. If the Quantum JensenShannon divergence is nega-tive denite, then the unaligned Quantum JensenShannon Kernel ispositive denite.

    Proof. First note that for the computation of Quantum JensenShannon divergence, padding the smaller matrix to the size of thelarger one is equivalent to padding both to an even larger size.Thus we can consider the computation of the kernel matrix over Ngraphs to be performed over density matrices padded to the size ofthe larger graph. This eliminates the dependence of the densitymatrix of the rst graph on the choice of the second graph.

    In [38] the authors show that if sGa;Gb is a conditionallypositive semidenite kernel, i.e., sGa;Gb is a negative denitekernel, then ks expsGa;Gb is positive semidenite for any40.

    Hence, if the Quantum JensenShannon divergence DQJSGa;Gbis negative denite, then DQJSGa;Gb is conditionally positivesemidenite, and the Quantum JensenShannon kernelkQJSGa;Gb expDQJSGa;Gb is positive denite.

    Note that this proof is conditioned on the negative denitenessof the Quantum JensenShannon divergence, as is the case for theclassical JensenShannon divergence. Currently this is only aconjecture, proved to be true on pure quantum states, and forwhich there is a strong empirical evidence [39].

    In general for the optimal alignment kernel [10,11] we cannotguarantee positive deniteness for the aligned version of our

    quantum kernel due to the lack of transitivity of the alignments.We do have, however, a similar result when using any (possiblynon optimal) set of alignments that satisfy transitivity.

    Corollary 3.2. If the Quantum JensenShannon divergence is nega-tive denite and the density matrices are aligned by a transitive set ofpermutations, then the unaligned Quantum JensenShannon Kernel ispositive denite.

    Proof. If the alignments are transitive, each of the density matricescan be aligned to a single common frame, for example that of the rstgraph. The aligned kernel on the original unaligned data is equivalentto the unaligned kernel on the new pre-aligned data.

    Another important property for a kernel is its independence onthe ordering of the data. There are two ways by which the kernelcan depend on this ordering: First, it might depend on the order inwhich different graphs are presented. Second, it might depend onthe order in which the vertices are presented in each graph.

    Since the kernel only depends on second order relations, asufcient condition for the rst type of independence is for thekernel to be deterministic and symmetric. If this is so, then apermutation of the graphs will just result in the same permutationto be applied to the rows and columns of the kernel matrix. For thesecond type of invariance, we need full invariance of the computedkernel values with respect to vertex permutations.

    The unaligned kernel trivially satises the rst condition sincethe Quantum JensenShannon divergence is both deterministicand symmetric, while it fails to satisfy the second condition, thusresulting in a kernel that is independent on the order of thegraphs, but not on the vertex order within each graph. For thealigned kernel, on the other hand, the addition of the alignmentstep requires a little more analysis.

    Theorem 3.3. The aligned Quantum JensenShannon kernel isindependent both to graph and to vertex order.

    Proof. Recall that given two graphs Ga and Gb with mixed statedensity matrices a and b, the aligned kernel is

    kQJSAGa;Gb maxQ A

    exp DQJS a;QbQT

    : 28

    As a result the kernel value is uniquely identied evenwhen there aremultiple possible alignments. Due to their optimality these align-ments must all result in the same optimal divergence. It is easy to seethat the kernel is also symmetric. In fact, since the von Neumannentropy is invariant to similarity transformations, we have

    HNaQbQT

    2

    !HN Q

    QTaQb2

    QT !

    HNQTaQb

    2

    !: 29

    From the above DQJSa;QbQT DQJSQTaQ ; b DQJSb;QTaQ .Hence,

    kQJSAGa;Gb maxQ A

    expDQJSa;QbQT

    maxQ A

    expDQJSb;QaQT kQJSAGb;Ga: 30

    As a result, the aligned kernel is deterministic and symmetric, even ifthe optimizer might not be. Thus the kernel is independent of theorder of the graphs.

    The independence of vertex ordering again derives directlyfrom the optimality of the value of the divergence. Let G0b be avertex-permuted version of Gb, such that its mixing matrix

    0b TbTT for a permutation matrix TA, and assume thatkQJSAGa;G0bakQJSAGa;Gb. Without loss of generality we canassume kQJSAGa;G0b4kQJSAGa;Gb. LetQn argmax

    QexpDQJSa;Q0bQT ; 31

    L. Bai et al. / Pattern Recognition () 6

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • then

    expDQJSa;Qn0bQnT expDQJSa;QnTbTTQnT 4max

    Q AexpDQJSa;QbQT ; 32

    leading to a contradiction.

    Note that these properties depend on the optimality of thealignment with respect to the Quantum JensenShannon diver-gence. We approximate the optimal alignment by the minimiza-tion of the Frobenious (L2) norm between the density matrices.This is approximated using Umeyama's algorithm since the result-ing matching problem is still in general NP-hard. However, in theproof, the optimality was used as a means to uniquely identify thevalue of the divergence. Any other uniquely identied value wouldstill give an order invariant kernel, even if this is suboptimal.

    Eq. (29) does not depend on the choice of optimal alignment,and so it would still hold under L2 optimality. The main problem iswhether two permutations leading to the same (minimal) Frobe-nious distance between the density matrices can give differentdivergence values. If permutations Q1 and Q2 differ by thecombination of symmetries present in the two graphs, i.e.,Q1 SQ2R with SAAutGa and RAAutGb, then clearly they willhave the same divergence since a SsST and b RbRT , but thatmight not characterize all the L2 optimal permutations. None-theless, we have the following

    Theorem 3.4. If the L2 optimal permutation is unique moduloisomorphism in the two graphs, then the L2 approximation isindependent of the order of the graphs.

    Proof. Derives directly from the uniqueness of the kernel value forall the optimizers.

    There remains another source of error, since Umeyama's algo-rithm is only an approximation to the Frobenious minimizer. Theimpact of this is much wider since the lack of optimality implies aninability to uniquely identify the value of the kernel in an invariantway. However, this is implicit in all alignment-based approachesdue to the NP hard nature of the graph matching problem.

    3.3. Complexity analysis

    For a graph dataset having N graphs, the kernel matrix of thequantum JensenShannon graph kernel can be computed usingAlgorithm 1.

    Algorithm 1. Computing the kernel matrix of the quantumJensenShannon graph kernel on N graphs.

    Input: A graph dataset G fG1;;Ga;;Gb;;GNg with N testgraphs. Output: A jNj jNj kernel matrix KQJS.1. Computing the density matrices of graphs: For each graph

    GaVa; Ea, compute the density matrix evolving from time0 to time T using the denition in Section 19

    2. Computing the density matrix for each pair of graphs: Foreach pair of graphs GaVa; Ea and GbVb; Eb with theirdensity matrices a and b, compute

    ab2 using the

    unaligned or the aligned procedure.3. Computing the kernel matrix for the N graphs: Compute

    the kernel matrix KQJS. For each pair of graphs GaVa; Eaand GbVb; Eb, compute the Ga;Gb th element of KQJS askQJSGa;Gb using Eq. (24) or Eq. (26).

    For N graphs each of which has n vertices, the time complexityof the quantum JensenShannon graph kernel on all pairs of thesegraphs is given by the three computational steps of Algorithm 1.

    These are (a) The computation of the density matrix for eachgraph. Based on the denition in Section 2.2, this step has timecomplexity ONn3. (b) The computation of the aligned densitymatrix and the unaligned density matrix for all pairs of graphs.This has time complexities ON2n4 and ON2n. (c) As a result, thetime complexities of the quantum kernel using the aligned andunaligned density matrices are ONn3N2n4 and ONn3N2n,respectively.

    4. Experimental evaluation

    In this section, we test our graph kernels on several standardgraph based datasets. There are three stages to our experimentalevaluation. First, we evaluate the stability of the von Neumannentropies obtained from the density matrices with time t. Second,we empirically compare our new kernel with several alternativestate-of-the-art graph kernels. Finally, we evaluate the computa-tional efciency of our new kernel.

    4.1. Graph datasets

    We show the performance of our quantum JensenShannongraph kernel on ve standard graph based datasets from bioinfor-matics and computer vision. These datasets include MUTAG, PPIs,PTC(MR), COIL5 and Shock.

    Some statistic concerning the datasets are given in Table 1.MUTAG: The MUTAG dataset consists of graphs representing

    188 chemical compounds, and aims to predict whether eachcompound possesses mutagenicity [40]. Since the vertices andedges of each compound are labeled with a real number, wetransform these graphs into unweighted graphs.

    PPIs: The PPIs dataset consists of proteinprotein interactionnetworks (PPIs) [41]. The graphs describe the interaction relation-ships between histidine kinases in different species of bacteria.Histidine kinase is a key protein in the development of signaltransduction. If two proteins have direct (physical) or indirect(functional) association, they are connected by an edge. There are219 PPIs in this dataset and they are collected from ve differentkinds of bacteria with the following evolution order (from older tomore recent): Aquifex4 and thermotoga4 PPIs from Aquifex aelicusand Thermotoga maritima, respectively, Gram-Positive52 PPIs fromStaphylococcus aureus, Cyanobacteria73 PPIs from Anabaena varia-bilis and Proteobacteria40 PPIs from Acidovorax avenae. There is anadditional class (Acidobacteria46 PPIs) which is more controversialin terms of the bacterial evolution since they were discovered.Here we select Proteobacteria40 PPIs and Acidobacteria46 PPIs asthe testing graphs.

    PTC: The PTC (The Predictive Toxicology Challenge) datasetrecords the carcinogenicity of several 100 chemical compounds formale rats (MR), female rats (FR), male mice (MM) and female mice(FM) [42]. These graphs are very small (i.e., 2030 vertices), andsparse (i.e., 2540 edges. We select the graphs of male rats (MR)for evaluation. There are 344 test graphs in the MR class.

    COIL5: We create a dataset referred to as COIL5 from the COILimage database. The COIL database consists of images of 100 3D

    Table 1Information of the graph based datasets.

    Datasets MUTAG PPIs PTC COIL5 Shock

    Max # vertices 28 232 109 241 33Min # vertices 10 3 2 72 4Ave # vertices 17.93 109.60 25.60 144.90 13.16# graphs 188 86 344 360 150# classes 2 2 2 5 10

    L. Bai et al. / Pattern Recognition () 7

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • objects. In our experiments, we use the images for the rst veobjects. For each of these objects we employ 72 images capturedfrom different viewpoints. For each image we rst extract cornerpoints using the Harris detector, and then establish Delaunaygraphs based on the corner points as vertices. Each vertex is usedas the seed of a Voronoi region, which expands radially with aconstant speed. The linear collision fronts of the regions delineatethe image plane into polygons, and the Delaunay graph is theregion adjacency graph for the Voronoi polygons. Associated witheach object, there are 72 graphs from the 72 different object views.Since the graphs of an object through different viewpoints are verydifferent, this dataset is hard for graph classication.

    Shock: The Shock dataset consists of graphs from the Shock 2Dshape database. Each graph is a skeletal-based representation ofthe differential structure of the boundary of a 2D shape. There are150 graphs divided into 10 classes. Each class contains 15 graphs.

    4.2. Evaluations on the von Neumann entropy with time t

    We commence by investigating the von Neumann entropyassociated with the graphs as the time t varies. In this experiment,we use the testing graphs in the MUTAG, PPIs, PTC and COIL5datasets, while we do not perform the evaluation on the Shockdataset. For each graph, we allow a continuous-time quantumwalk to evolve with t 0;1;2;;50. As the walk evolves fromtime t0 to time t50 we compute the corresponding densitymatrix using Eq. (15). Thus, at each time t we can compute the vonNeumann entropy of the corresponding density matrix associated

    with each graph. The experimental results are shown in Fig. 1. Thesubgures of Fig. 1 show the mean von Neumann entropies for thegraphs in the MUTAG, PPIs, PTC and COIL5 datasets separately. Thex-axis shows the time t from 0 to 50, and the y-axis shows themean value of the von Neumann entropies for graphs belonging tothe same class. Here the different lines represent the entropies forthe different classes of graphs. These plots demonstrate that thevon Neumann entropy can be used to distinguish the differentclasses of graphs present in a dataset.

    4.3. Experiments on standard graph datasets from bioinformatics

    4.3.1. Experimental setupWe now evaluate the performance of our quantum Jensen

    Shannon kernel using both the aligned density matrix (QJSA) andthe unaligned density matrix (QJSU). Furthermore, we compareour kernel with several alternative state-of-the-art graph kernels.These kernels include (1) the WeisfeilerLehman subtree kernel(WL) [5], (2) the shortest path graph kernel (SPGK) [4], (3) theJensenShannon graph kernel associated with the steady staterandom walk (JSGK) [14], (4) the backtrackless random walkkernel using the Ihara zeta function based cycles (BRWK) [6],and (5) the random-walk graph kernel [3]. For our quantumkernel, we decide to let t-1. For the WeisfeilerLehmansubtree kernel, we set the dimension of the WeisfeilerLehmanisomorphism as 10. Based on the denition in [5], this means thatwe compute 10 different WeisfeilerLehman subtree kernelmatrices (i.e., k1; k2;; k10) with different subtree heights

    0 10 20 30 40 500

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    The time t

    The

    mea

    n en

    tropy

    val

    ueEntropy evaluation on the MUTAG dataset

    Mutagenic aromaticMutagenic heteroaromatic

    0 10 20 30 40 500

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    The time t

    The

    mea

    n en

    tropy

    val

    ue

    Entropy evaluation on the PPIs dataset

    Proteobacteria40 PPIsAcidobacteria46 PPIs

    0 10 20 30 40 500

    0.05

    0.1

    0.15

    0.2

    0.25

    The

    mea

    n en

    tropy

    val

    ue

    The time t

    Entropy evaluation on the PTC dataset

    First class (MR)Second class (MR)

    0 10 20 30 40 500.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    The time t

    The

    mea

    n en

    tropy

    val

    ue

    Entropy evaluation on the COIL5 dataset

    box objectonion objectship objecttomato objectbottle object

    Fig. 1. Evaluations on the von Neumann entropy with time t: (a) on the MUTAG dataset, (b) on the PPIs dataset, (c) on the PTC(MR) dataset and (d) on the COIL5 dataset.

    L. Bai et al. / Pattern Recognition () 8

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • h (h1,2,,10), respectively. Note that the WL and SPGK kernelsare able to accommodate attributed graphs. In our experiments,we use the vertex degree as a vertex label for the WL and SPGKkernels.

    For each kernel and dataset, we perform a 10-fold cross-validation using a C-Support Vector Machine (C-SVM) in order toevaluate the classication accuracies of the different kernels. Morespecically, we use the C-SVM implementation of LIBSVM [43]. Foreach class, we use 90% of the samples for training and theremaining 10% for testing. The parameters of the C-SVMs areoptimized separately for each dataset. We repeat the evaluation 10times and we report the average classication accuracies (7stan-dard error) of each kernel in Table 2. Furthermore, we also reportthe runtime of computing the kernel matrices of each kernel inTable 3, with the runtime measured under Matlab R2011a runningon a 2.5 GHz Intel 2-Core processor (i.e., i5-3210m). Note that, forthe WL kernel, the classication accuracies are the averageaccuracies over all the 10 matrices, and the runtime refers to thatrequired for computing all the 10 matrices (see [5] for details).

    4.3.2. Results and discussion(a) On the MUTAG dataset, the accuracies for all of the kernels

    are similar. The SPGK kernel achieves the highest accuracy. Yet, theaccuracies of our QJSA and QJSU quantum kernels are competitivewith that of the SPGK kernel and outperform those of other

    Table 2Accuracy comparisons (in % 7standard errors) on graph datasets abstracted frombioinformatics and computer vision.

    Datasets MUTAG PPIs PTC(MR) COIL5 Shock

    QJSU 82.727 .44 69.5071.20 56.707 .49 70.117 .61 40.607 . 92QJSA 82.837 .50 73.3771.04 57.397 .46 69.757 .58 42.807 .86WL 82.057 .57 78.5071.40 56.057 .51 33.1671.01 36.4071.00SPGK 83.387 .81 61.1271.09 56.557 .53 69.667 .52 37.887 .93JSGK 83.117 .80 57.8771.36 57.297 .41 69.137 .79 21.737 .76BRWK 77.507 .75 53.5071.47 53.977 .31 14.637 .21 0.337 .37RWGK 80.777 .72 55.007 .88 55.917 .37 20.807 .47 2.2671.01

    Table 3Runtime comparisons on graph datasets abstracted from bioinformatics andcomputer vision.

    Datasets MUTAG PPIs PTC COIL5 Shock

    QJSU 20 59 1046 18020 14QJSA 1030 23025 16040 8h290 32WL 4 13 11 105 3SPGK 1 7 1 31 1JSGK 1 1 1 1 1BRWK 2 14020 3 16046 8RWGK 46 107 2035 19040 23

    0 5 10 15 20 25 300

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    1.8

    Graph size n

    Run

    time

    in s

    econ

    ds

    0 50 100 150 200 250 3000

    0.5

    1

    1.5

    Graph size n

    Run

    time

    in s

    econ

    ds

    0 20 40 60 80 1000

    20

    40

    60

    80

    100

    120

    140

    160

    180

    Data size N

    Run

    time

    in s

    econ

    ds

    0 20 40 60 80 1000

    50

    100

    150

    Data size N

    Run

    time

    in s

    econ

    ds

    Fig. 2. Runtime evaluation: (a) QJSA with graph size n, (b) QJSU with graph size n, (c) QJSA with data size N and (d) QJSU with data size N.

    L. Bai et al. / Pattern Recognition () 9

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • kernels. (b) On the PPIs dataset, the WL kernel achieves thehighest accuracy. The accuracies of our QJSA and QJSU quantumkernels are lower than that of the WL kernel, but outperform thoseof other kernels. (c) On the PTC dataset, the accuracies for all of thekernels are similar. Our QJSA quantum kernel achieves the highestaccuracy. The accuracy of our QJSU quantum kernel is competitiveor outperforms that of other kernels. (d) On the COIL5 dataset, theaccuracies of all the kernels are similar with the exception of theWL, RWGK and BRWK kernels. Our QJSA quantum kernel achievesthe highest accuracy. The accuracy of our QJSU quantum kernelovercomes that of other kernels. (e) On the Shock dataset, ourQJSA quantum kernel achieves the highest accuracy. The accuracyof our QJSU quantum kernel overcomes that of other kernels.

    Overall, in terms of classication accuracy, our QJSA and QJSUquantum JensenShannon graph kernels outperform or are com-petitive with the state-of-the-art kernels. Especially, the classica-tion accuracies of our quantum kernel are signicantly better thanthose of the graph kernels using the classical JensenShannondivergence, the classical random walk and the backtracklessrandom walk. This suggests that our kernel, which makes use ofcontinuous-time quantum walks to probe the graphs structure, issuccessful in capturing the structural similarities between differ-ent graphs. Furthermore, we observe that the performance of ourQJSA quantum kernel is better than that of our QJSU quantumkernel. The reason for this is that the aligned density matrixcomputed through Umeyama's matching method can reect the

    precise correspondence information between pairs of vertices inthese graphs, while the unaligned density matrix ignores thecorrespondence information, and it is not permutation invariantto the vertex order.

    In terms of the runtime, all the kernels can complete thecomputation in polynomial time on all the datasets. The computa-tion efciency of our QJSU is lower than that of the WL, SPGK andJSGK kernels, but it is faster than that of the BRWK and RWGKkernels (i.e., the kernels using the classical random walk and thebacktrackless random walk). The computational efciency of ourQJSA kernel is lower than any alternative kernel. The reason forthis is that the QJSA kernel requires extra computation for thealignment of the density matrices.

    4.4. Computational evaluation

    In this subsection, we evaluate the relationship between thecomputational overheads (i.e., the CPU runtime) of our kernel andthe structural complexity or the number of the associated graphs.

    4.4.1. Experimental setupWe evaluate the computational efciency of our quantum

    kernel on randomly generated graphs with respect to two para-meters: (a) the graph size n and (b) the graph dataset size N.

    0 2 4 6 8 10 126

    5

    4

    3

    2

    1

    0

    1

    3*log(n)

    Log

    Run

    time

    0 2 4 6 8 10 127

    6

    5

    4

    3

    2

    1

    0

    1

    3*log(n)

    Log

    Run

    time

    0 2 4 6 8 106

    4

    2

    0

    2

    4

    6

    2*log(N)

    Log

    Run

    time

    0 2 4 6 8 105

    4

    3

    2

    1

    0

    1

    2

    3

    4

    5

    2*log(N)

    Log

    Run

    time

    Fig. 3. Log runtime evaluation: (a) log runtime vs 3 log n using QJSA kernel, (b) log runtime vs 3 log n using QJSU kernel, (c) log runtime vs 2 log N using QJSA kernel and(d) log runtime vs 2 log N using QJSU kernel.

    L. Bai et al. / Pattern Recognition () 10

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • More specically, we vary n f10;20;;300g and N f1;2;;100g,separately.

    We rst generate 30 pairs of graphs with an increasing numberof vertices. We report the runtime for computing the kernel valuesfor all the pairs of graphs. We also generate 100 graph datasetswith an increasing number of test graphs. Each test graph has 50vertices. We report the runtime for computing the kernel matricesof the graphs from each dataset. The CPU runtime is reported inFig. 2. The experiments are run in Matlab R2011a on a 2.5 GHzIntel 2-Core processor (i.e., i5-3210m).

    4.4.2. Experimental resultsFig. 2(a) and (c) shows the results obtained with the QJSA

    kernel, when varying the parameters n and N, respectively. Fig. 2(b) and (d) shows those obtained with the QJSU kernel. Further-more, for the parameters n and N, we also plot the log runtimeversus 3 log n and 2 log N respectively. Fig. 3(a) and (c) showsthose obtained with the QJSA kernel. Fig. 3(b) and (d) shows thoseobtained with the QJSA kernel. In Fig. 3 there is an approximatelylinear relationship between the log runtime and the correspond-ing log parameters (i.e., 3 log n and 2 log N).

    From Fig. 2 and 3 we obtain the following conclusions. Whenvarying the number of vertices n of the graphs, we observe thatthe runtime for computing the quantum JensenShannon QJSAand QJSU graph kernels scales approximately cubically with n.When varying the graph dataset size N, we observe that theruntime for computing the QJSA and QJSU graph kernels scalesquadratically with N. These computational evaluations verify thatour quantum JensenShannon graph kernel can be computed inpolynomial time.

    Furthermore, we observe that the QJSU kernel is a little moreefcient than the QJSA kernel. The reason for this is that the QJSAkernel requires extra computations for evaluating the vertexcorrespondences.

    5. Conclusion

    In this paper, we have developed a novel graph kernel by usingthe quantum JensenShannon divergence and the continuous-time quantum walk on a graph. Given a graph, we evolved acontinuous-time quantum walk on its structure and we showedhow to associate a mixed quantum state with the vertices of thegraph. From the density matrix for the mixed state we computedthe von Neumann entropy. With the von Neumann entropies tohand, the kernel between a pair of graphs was dened as afunction of the quantum JensenShannon divergence betweenthe corresponding density matrices. Experiments on several stan-dard datasets demonstrate the effectiveness of the proposed graphkernel.

    Our future work includes extending the quantum graph kernelto hypergraphs. Bai et al. [44] have developed a hypergraph kernelby using the classical JensenShannon divergence, and Ren et al.[45] have explored the use of discrete-time quantum walks ondirected line graphs, which can be used as structural representa-tions for hypergraphs. It would thus be interesting to extend theseworks by using the quantum JensenShannon divergence tocompare the quantumwalks on the directed line graphs associatedwith a pair of hypergraphs.

    Conict of interest statement

    None declared.

    Acknowledgements

    Edwin R. Hancock is supported by a Royal Society WolfsonResearch Merit Award.

    We thank Prof. Karsten Borgwardt and Dr. Nino Shervashidzefor providing the Matlab implementation for the various graphkernel methods, and Dr. Geng Li for providing the graph datasets.We also thank Prof. Horst Bunke and Dr. Peng Ren for theconstructive discussion and suggestions.

    References

    [1] B. Scholkopf, A. Smola, Learning with Kernels, MIT Press, 2002.[2] D. Haussler, Convolution Kernels on Discrete Structures, Technical Report UCS-

    CRL-99-10, Santa Cruz, CA, USA, 1999.[3] H. Kashima, K. Tsuda, A. Inokuchi, Marginalized kernels between labeled

    graphs, in: Proceedings of International Conference on Machine Learning(ICML), 2003, pp. 321328.

    [4] K.M. Borgwardt, H.P. Kriegel, Shortest-path kernels on graphs, in: Proceedingsof the IEEE International Conference on Data Mining (ICDM), 2005, pp. 7481.

    [5] N. Shervashidze, P. Schweitzer, E.J. van Leeuwen, K. Mehlhorn, K.M. Borgwardt,WeisfeilerLehman graph kernels, J. Mach. Learn. Res. 1 (2010) 148.

    [6] F. Aziz, R.C. Wilson, E.R. Hancock, Backtrackless walks on a graph, IEEE Trans.Neural Netw. Learn. Syst. 24 (2013) 977989.

    [7] F. Costa, K.D. Grave, Fast neighborhood subgraph pairwise distance kernel, in:Proceedings of International Conference on Machine Learning (ICML), 2010,pp. 255262.

    [8] Kriege Nils, Mutzel Petra, Subgraph matching Kernels for attributed graphs, in:Proceedings of International Conference on Machine Learning (ICML), 2012.

    [9] Neumann Marion, Novi Patricia, Roman Garnett, Kristian Kersting, Efcientgraph kernels by randomization, Machine Learning and Knowledge Discoveryin Databases, Springer, Berlin, Heidelberg (2012) 378393.

    [10] Holger Frohlich, Jorg K. Wegner, Florian Sieker, Andreas Zell, Optimal assign-ment kernels for attributed molecular graphs, in: International Conference onMachine Learning, 2005, pp. 225232.

    [11] Jean-Philippe Vert, The Optimal Assignment Kernel is Not Positive Denite,arXiv:0801.4061, 2008.

    [12] Michel Neuhaus, Horst Bunke, Edit distance-based kernel functions forstructural pattern classication, Pattern Recognit. 39 (10) (2006) 18521863.

    [13] P. Lamberti, A. Majtey, A. Borras, M. Casas, A. Plastino, Metric character of thequantum JensenShannon divergence, Phys. Rev. A 77 (2008) 052311.

    [14] L. Bai, E.R. Hancock, Graph kernels from the JensenShannon divergence, J.Math. Imaging Vis. 47 (2013) 6069.

    [15] L.K. Grover, A fast quantum mechanical algorithm for database search, in:Proceedings of the ACM Symposium on the Theory of Computing (STOC), 1996,pp. 212219.

    [16] P.W. Shor, Polynomial-time algorithms for prime factorization and discretelogarithms on a quantum computer, SIAM J. Comput. 26 (1997) 14841509.

    [18] A. Majtey, P. Lamberti, D. Prato, JensenShannon divergence as a measure ofdistinguishability between mixed quantum states, Phys. Rev. A 72 (2005)052310.

    [19] E. Farhi, S. Gutmann, Quantum computation and decision trees, Phys. Rev. A 58(1998) 915.

    [20] D. Aharonov, A. Ambainis, J. Kempe, U. Vazirani, Quantumwalks on graphs, in:Proceedings of ACM Theory of Computing (STOC), 2001, pp. 5059.

    [21] A. Ambainis, E. Bach, A. Nayak, A. Vishwanath, J. Watrous, One-dimensionalquantum walks, in: Proceedings of ACM Theory of Computing (STOC), 2001,pp. 6069.

    [22] A. Ambainis, Quantum walks and their algorithmic applications, Int. J.Quantum Inf. 1 (2003) 507518.

    [23] A.M. Childs, E. Farhi, S. Gutmann, An example of the difference betweenquantum and classical random walks, Quantum Inf. Process. 1 (2002) 3553.

    [24] A.M. Childs, Universal computation by quantum walk, Phys. Rev. Lett. 102 (18)(2009) 180501.

    [26] L. Bai, E.R. Hancock, A. Torsello, L. Rossi, A quantum JensenShannon graphkernel using the continuous-time quantum walk, in: Proceedings of Graph-Based Representations in Pattern Recognition (GbRPR), 2013, pp. 121131.

    [27] L. Rossi, A. Torsello, E.R. Hancock, A continuous-time quantum walk kernel forunattributed graphs, in: Proceedings of Graph-Based Representations inPattern Recognition (GbRPR), 2013, pp. 101110.

    [28] L. Rossi, A. Torsello, E.R. Hancock, Attributed graph similarity from thequantum JensenShannon divergence, in: Proceedings of Similarity-BasedPattern Recognition (SIMBAD), 2013, pp. 204218.

    [29] S. Umeyama, An eigendecomposition approach to weighted graph matchingproblems, IEEE Trans. Pattern Anal. Mach. Intell. 10 (5) (1988) 695703.

    [30] Nan Hu, Leonidas Guibas, Spectral Descriptors for Graph Matching,arXiv:1304.1572, 2013.

    [31] J. Kempe, Quantum random walks: an introductory overview, Contemp. Phys.44 (2003) 307327.

    [32] M. Nielsen, I. Chuang, Quantum Computation and Quantum Information,Cambridge University Press, 2010.

    L. Bai et al. / Pattern Recognition () 11

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

  • [33] A.F. Martins, N.A. Smith, E.P. Xing, P.M. Aguiar, M.A. Figueiredo, Nonextensiveinformation theoretic kernels on measures, J. Mach. Learn. Res. 10 (2009)935975.

    [34] W.K. Wootters, Statistical distance and Hilbert space, Phys. Rev. D 23 (2)(1981) 357.

    [35] G. Lindblad, Entropy, information and quantum measurements, Commun.Math. Phys. 33 (4) (1973) 305322.

    [36] D. Bures, An extension of Kakutani's theorem on innite product measures tothe tensor product of seminite Wn-algebras, Trans. Am. Math. Soc. 135(1969) 199212.

    [37] L. Rossi, A. Torsello, E.R. Hancock, R. Wilson, Characterising graph symmetriesthrough quantum JensenShannon divergence, Phys. Rev. E 88 (3) (2013)032806.

    [38] R. Konder, J. Lafferty, Diffusion kernels on graphs and other discrete inputspaces, in: Proceedings of International Conference on Machine Learning(ICML), 2002, pp. 315322.

    [39] B. Jop, P. Harremos, Properties of classical and quantum JensenShannondivergence, Phys. Rev. A 79 (5) (2009) 052311.

    [40] A.K. Debnath, R.L. Lopez de Compadre, G. Debnath, A.J. Shusterman, C. Hansch,Structure-activity relationship of mutagenic aromatic and heteroaromaticnitro compounds, correlation with molecular orbital energies and hydropho-bicity, J. Med. Chem. 34 (1991) 786797.

    [41] F. Escolano, E.R. Hancock, M.A. Lozano, Heat diffusion: thermodynamic depthcomplexity of networks, Phys. Rev. E 85 (2012) 036206.

    [42] G. Li, M. Semerci, B. Yener, M.J. Zaki, Effective graph classication based ontopological and label attributes, Stat. Anal. Data Mining 5 (2012) 265283.

    [43] C.-C. Chang, C.-J. Lin, LIBSVM: A Library for Support Vector Machines. Softwareavailable at http://www.csie.ntu.edu.tw/cjlin/libsvm, 2011.

    [44] L. Bai, E.R. Hancock, P. Ren, A JensenShannon kernel for hypergraphs, in:Proceedings of Structural, Syntactic, and Statistical Pattern Recognition-JointIAPR International Workshop (SSPR&SPR), 2012, pp. 7988.

    [45] P. Ren, T. Aleksic, D. Emms, R.C. Wilson, E.R. Hancock, Quantum walks, Iharazeta functions and cospectrality in regular graphs, Quantum Inf. Process. 10(2011) 405417.

    Lu Bai received both the B.Sc. and M.Sc. degrees from Faculty of Information Technology, Macau University of Science and Technology, Macau SAR, P.R. China. He is currentlypursuing the Ph.D. degree in the University of York, York, UK. He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA). His currentresearch interests include structural pattern recognition, machine learning, soft computing and approximation reasoning, especially in kernel methods and complexityanalysis on (hyper)graphs and networks.

    Luca Rossi received his B.Sc., M.Sc., and Ph.D. in computer science from Ca' Foscari University of Venice, Italy. He is now a Postdoctoral Research Fellow at the School ofComputer Science of the University of Birmingham. His current research interests are in the areas of graph-based pattern recognition, machine learning, network science andcomputer vision.

    Andrea Torsello received his Ph.D. in computer science at the University of York, UK, and is currently an assistant professor at Ca' Foscari University of Venice, Italy. Hisresearch interests are in the areas of computer vision and pattern recognition, in particular, the interplay between stochastic and structural approaches and game-theoreticmodels. Recently, he co-edited a special issue of Pattern Recognition on Similarity-based pattern recognition. He has published around 40 technical papers in refereedjournals and conference proceedings and has been in the program committees of various international conferences and workshops. He was a co-chair of the edition of GbR, awell-established IAPR workshop on Graph-based methods in Pattern Recognition, held in Venice in 2009. Since November 2007 he holds a visiting research position at theInformation Technology Faculty of Monash University, Australia.

    Edwin R. Hancock received the B.Sc., Ph.D., and D.Sc. degrees from the University of Durham, Durham, UK. He is now a professor of computer vision in the Department ofComputer Science, University of York, York, UK. He has published nearly 150 journal articles and 550 conference papers. He was the recipient of a Royal Society WolfsonResearch Merit Award in 2009. He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Pattern Recognition,Computer Vision and Image Understanding, and Image and Vision Computing. His awards include the Pattern Recognition Society Medal in 1991, outstanding paper awardsfrom the Pattern Recognition Journal in 1997, and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001, the AsianConference on Computer Vision in 2002, the International Conference on Pattern Recognition (ICPR) in 2006, British Machine Vision Conference (BMVC) in 2007, and theInternational Conference on Image Analysis and Processing in 2009. He is a Fellow of the International Association for Pattern Recognition, the Institute of Physics, theInstitute of Engineering and Technology, and the British Computer Society. He was appointed as the founding Editor-in-Chief of the Institute of Engineering & TechnologyComputer Vision Journal in 2006. He was a General Chair for BMVC in 1994 and the Statistical, Syntactical and Structural Pattern Recognition in 2010, Track Chair for ICPR in2004, and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008. He established the energyminimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997.

    L. Bai et al. / Pattern Recognition () 12

    Please cite this article as: L. Bai, et al., A quantum JensenShannon graph kernel for unattributed graphs, Pattern Recognition (2014),http://dx.doi.org/10.1016/j.patcog.2014.03.028i

    A quantum JensenShannon graph kernel for unattributed graphsIntroductionLiterature reviewGraph kernelsQuantum computation

    ContributionsPaper outline

    Quantum mechanical backgroundThe continuous-time quantum walkA density matrix from the mixed stateThe von Neumann entropy of a graphQuantum JensenShannon divergence

    A quantum JensenShannon graph kernelAlignment of the density matrixProperties of the quantum JensenShannon kernel for graphsComplexity analysis

    Experimental evaluationGraph datasetsEvaluations on the von Neumann entropy with time tExperiments on standard graph datasets from bioinformaticsExperimental setupResults and discussion

    Computational evaluationExperimental setupExperimental results

    ConclusionConflict of interest statementAcknowledgementsReferences