Neural Dynamics on Complex Networks - arXiv

Neural Dynamics on Complex NetworksChengxi Zang

Department of Population Health Sciences, Weill CornellMedicine

[email protected]

Fei WangDepartment of Population Health Sciences, Weill Cornell

[email protected]

ABSTRACTLearning continuous-time dynamics on complex networks is cru-cial for understanding, predicting and controlling complex systemsin science and engineering. However, this task is very challengingdue to the combinatorial complexities in the structures of highdimensional systems, their elusive continuous-time nonlinear dy-namics, and their structural-dynamic dependencies. To addressthese challenges, we propose to combine Ordinary DifferentialEquation Systems (ODEs) and Graph Neural Networks (GNNs) tolearn continuous-time dynamics on complex networks in a data-driven manner. We model differential equation systems by GNNs.Instead of mapping through a discrete number of neural layers inthe forward process, we integrate GNN layers over continuous timenumerically, leading to capturing continuous-time dynamics ongraphs. Our model can be interpreted as a Continuous-time GNNmodel or a Graph Neural ODEs model. Our model can be utilized forcontinuous-time network dynamics prediction, structured sequenceprediction (a regularly-sampled case), and node semi-supervisedclassification tasks (a one-snapshot case) in a unified framework.We validate our model by extensive experiments in the above threescenarios. The promising experimental results demonstrate ourmodel’s capability of jointly capturing the structure and dynamicsof complex systems in a unified framework.

CCS CONCEPTS•Mathematics of computing→Graph algorithms;Ordinarydifferential equations; • Computing methodologies → Neu-ral networks; Network science;

KEYWORDSContinuous-time Graph Neural Networks; Graph Neural OrdinaryDifferential Equations; Continuous-timeGNNs; GraphNeural ODEs;Continuous-time Network Dynamics Prediction; Structured Se-quence Prediction; Graph Semi-supervised Learning; DifferentialDeep Learning on Graphs;ACM Reference Format:Chengxi Zang and Fei Wang. 2020. Neural Dynamics on Complex Networks.In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discoveryand DataMining (KDD ’20), August 23–27, 2020, Virtual Event, CA, USA.ACM,New York, NY, USA, 11 pages. https://doi.org/10.1145/3394486.3403132

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’20, August 23–27, 2020, Virtual Event, CA, USA© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-7998-4/20/08. . . $15.00https://doi.org/10.1145/3394486.3403132

1 INTRODUCTIONReal-world complex systems, such as brain [14], ecological systems[13], gene regulation [2], human health [5], and social networks[21, 24, 46–49], etc., are usually modeled as complex networks andtheir evolution are governed by some underlying nonlinear dynam-ics [30]. Revealing these dynamics on complex networks modeledby differential equation systems is crucial for understanding thecomplex systems in nature. Effective analytical tools developed forthis goal can further help us predict and control these complexsystems. Although the theory of (nonlinear) dynamical systemshas been widely studied in different fields including applied math[38],statistical physics [30], engineering [37], ecology [13] and bi-ology [5], these developed models are typically based on a clearknowledge of the network evolution mechanism which are thususually referred to as mechanistic models. Given the complexity ofthe real world, there is still a large number of complex networkswhose underlying dynamics are unknown yet (e.g., they can be toocomplex to be modeled by explicit mathematical functions). At thesame time, massive data are usually generated during the evolutionof these networks. Therefore, modern data-driven approaches arepromising and highly demanding in learning dynamics on complexnetworks.

The development of a successful data-driven approach for mod-eling continuous-time dynamics on complex networks is very chal-lenging: the interaction structures of the nodes in real-world net-works are complex and the number of nodes and edges is large,which is referred to as the high-dimensionality of complex sys-tems; the rules governing the dynamic change of nodes’ states incomplex networks are continuous-time and nonlinear; the struc-tural and dynamic dependencies within the system are difficultto model by explicit mathematical functions. Recently, there hasbeen an emerging trend in the data-driven discovery of ordinarydifferential equations (ODE) or partial differential equations (PDE)to capture the continuous-time dynamics, including sparse regres-sion method [27, 34], residual network [31], feedforward neuralnetwork [33], etc. However, these methods can only handle verysmall ODE or PDE systems which consist of only a few interactionterms. For example, the sparse-regression-based method [27] showsthat its combinatorial complexity grows with the number of agentswhen building candidate interaction library. Effective learning ofcontinuous-time dynamics on large networks which consist of tensof thousands of agents and interactions is still largely unknown.

In this paper, we propose to combine Ordinary Differential Equa-tion Systems (ODEs) and Graph Neural Networks (GNNs) [43] tolearn non-linear, high-dimensional and continuous-time dynamicson graphs 1. We model differential equation systems by GNNs to

1We use graph to avoid the ambiguity with neural network. Otherwise, we use graphand network interchangeably to refer to linked objects.

arX

iv:1

908.

0649

1v2

[cs

.SI]

17

Jun

2020

https://doi.org/10.1145/3394486.3403132

https://doi.org/10.1145/3394486.3403132

capture the instantaneous rate of change of nodes’ states. Insteadof mapping through a discrete number of layers in the forwardprocess of conventional neural network models [20], we integrate[8] GNN layers over continuous time rather than discrete depths,leading to a novel Continuous-time GNN model. In a dynamical sys-tem view, the continuous depth can be interpreted as continuousphysical time, and the outputs of any hidden GNN layer at time tare instantaneous network dynamics at that moment. Thus we canalso interpret our model as a Graph Neural ODEs in analogy to theNeural ODEs [8]. Besides, we further enhance our algorithm bylearning the dynamics in a hidden space learned from the originalspace of nodes’ states. We name our model Neural Dynamics onComplex Networks (NDCN).

Our NDCN model can be used for following three tasks in aunified framework: 1) (Continuous-time network dynamics predic-tion): Can we predict the continuous-time dynamics on complexnetworks at an arbitrary time? 2) (Structured sequence prediction[36]): Can we predict next few steps of structured sequences? 3)(Graph semi-supervised classification [19, 44]): Can we infer thelabels of each node given features for each node and some labels atone snapshot? The experimental results show that our NDCN cansuccessfully finish above three tasks. As for the task 1, our NDCNis first of its kind which learns continuous-time dynamics on largecomplex networks modeled by differential equations systems. Asfor the task 2, our NDCN achieves lower error in predicting futuresteps of the structured sequence with much fewer parameters thanthe temporal graph neural network models [17, 28, 36, 45]. As forthe task 3, our NDCN learns a continuous-time dynamics to spreadfeatures and labels of nodes to predict unknown labels of nodes,showing very competitive performance compared to many graphneural networks [19, 39, 40]. Our framework potentially serves asa unified framework to jointly capture the structure and dynamicsof complex systems in a data-driven manner.

It’s worthwhile to summarize our contributions as follows:• A novel model:We propose a novel model NDCN, whichcombines Ordinary Differential Equation Systems (ODEs)and Graph Neural Networks (GNNs) to learn continuous-time dynamics on graphs. Such amodel can tackle continuous-time network dynamics prediction, structured sequence pre-diction, and node semi-supervised classification on graphsin a unified framework.

• Physical interpretations: Instead of mapping neural net-works through a discrete number of layers, we integratedifferential equation systems modeled by GNN over continu-ous time, which gives physical meaning of continuous-timedynamics on graphs to GNNs. Our NDCN can be interpretedas a Continuous-time GNN model or a Graph Neural ODEs.

• Good performance: Our experimental results show thatour NDCN learns continuous-time dynamics on various com-plex networks accurately, achieves lower error in structuredsequence prediction with much fewer parameters than tem-poral graph neural network models, and outperforms manyGNN models in node semi-supervised classification tasks.

Our codes and datasets are open-sourced at https://github.com/calvin-zcx/ndcn .

2 RELATEDWORKDynamics of complex networks. Real-world complex systemsare usually modeled as complex networks and driven by continuous-time nonlinear dynamics: the dynamics of brain and human micro-bial are examined in [14] and [5] respectively; [13] investigated theresilience dynamics of complex systems. [4] gave a pipeline to con-struct network dynamics. To the best of our knowledge, our NDCNis the first neural network approach which learns continuous-timedynamics on complex networks in a data-driven manner.

Data-driven discovery of differential equations. Recently,some data-driven approaches are proposed to learn the ordinary/-partial differential equations (ODEs or PDEs), including sparseregression [27], residual network [31], feedforward neural network[33], coupled neural networks [32] and so on. [27] tries to learncontinuous-time biological networks dynamics by sparse regressionover a large library of interaction terms. Building the interactionterms are prohibitively for large systems with many interactionsdue to its combinatorial complexity. In all, none of them can learnthe dynamics on complex systems with more than hundreds ofnodes and tens of thousands of interactions.

Neural ODEs and Optimal control. Inspired by residual net-work [16] and ordinary differential equation (ODE) theory [25, 35],seminal work neural ODE model [8] was proposed to re-write resid-ual networks, normalizing flows, and recurrent neural networkin a dynamical system way. See improved Neural ODEs in [10].However, our NDCN model deals with large differential equationssystems. Besides, our model solves different problems, namely learn-ing continuous-time dynamics on graphs. Relationships betweenback-propagation in deep learning and optimal control theory areinvestigated in [6, 15]. We formulate our loss function by leveragingthe concept of running loss and terminal loss in optimal control.We give novel constraints in optimal control which is modeled bygraph neural differential equations systems. Our model solves noveltasks, e.g. learning the dynamics on complex networks and refer toSec.3.1.

Graph neural networks and temporal-GNNs. Graph neuralnetworks (GNNs) [43], e.g., Graph convolution network (GCN) [19],attention-based GNN (AGNN) [39], graph attention networks (GAT)[40], etc., achieved very good performance on node semi-supervisedlearning on graphs. However, existing GNNs usually have integernumber of 1 or 2 layers [22, 43]. Our NDCN gives a dynamicalsystem view to GNNs: the continuous depth can be interpretedas continuous physical time, and the outputs of any hidden layerat time t are instantaneous rate of change of network dynamicsat that moment. By capturing continuous-time network dynamicswith real number of depth/time, our model gives very competitiveand even better results than above GNNs. By combining RNNs orconvolution operators with GNNs, temporal-GNNs [17, 28, 36, 45]try to predict next few steps of the regularly-sampled structuredsequences. However, these models can not be applied to continuous-time dynamics (observed at arbitrary physical times with differenttime intervals). Our NDCN not only predicts the continuous-timenetwork dynamics at an arbitrary time, but also predicts the struc-tured sequences very well with much fewer parameters.

https://github.com/calvin-zcx/ndcn


3 GENERAL FRAMEWORK3.1 Problem DefinitionWe can describe the continuous-time dynamics on a graph by adifferential equation system:

dX (t )dt

= f(X , G,W , t

), (1)

where X (t) ∈ Rn×d represents the state (node feature values) of adynamic system consisting of n linked nodes at time t ∈ [0,∞), andeach node is characterized by d dimensional features.G = (V, E) isthe network structure capturing how nodes are interacted with eachother.W (t) are parameters which control how the system evolvesover time. X (0) = X0 is the initial states of this system at time t = 0.The function f : Rn×d → Rn×d governs the instantaneous rateof change of dynamics on the graph. In addition, nodes can havevarious semantic labels encoded by one-hot encoding Y (X ,Θ) ∈{0, 1}n×k , and Θ represents the parameters of this classificationfunction. The problems we are trying to solve are:

• (Continuous-time network dynamics prediction)Howto predict the continuous-time dynamics dX (t )

dt on agraph at an arbitrary time?Mathematically, given a graphG and nodes’ states of system { ˆX (t1), ˆX (t2), ..., ˆX (tT )|0 ≤t1 < ... < tT } where t1 to tT are arbitrary physical times-tamps, can we learn differential equation systems dX (t )

dt =

f (X ,G,W , t) to predict continuous-time dynamics X (t) atan arbitrary physical time t? The arbitrary physical timesmean that {t1, ..., tT } are irregularly sampled with differentobservational time intervals. When t > tT , we call the pre-diction task extrapolation prediction, while t < tT andt , {t1, ..., tT } for interpolation prediction.

• (Structured sequence prediction). As a special case whenX (t) are sampled regularly with the same time intervals{ ˆX [1], ˆX [2], ..., ˆX [T ]}, the above problem degenerates to astructured sequence learning task with an emphasis on se-quential order instead of arbitrary physical times. The goal isto predict nextm steps’ structured sequenceX [T+1], ...,X [T+m] .

• (Node semi-supervised classification on graphs) Howto predict the unknown nodes’ labels given featuresfor each node and some labels at one snapshot? As aspecial case of above problem with an emphasis on a specificmoment, given a graphG , one-snapshot featuresX and somelabelsMask ⊙Y , can we learn a network dynamics to predictunknown labels (1 −Mask) ⊙ Y by spreading given featuresand labels on the graph?

We try to solve above three tasks on learning dynamics on graphsin a unified framework.

3.2 A Unified Learning FrameworkWe formulate our basic framework as follows:

argminW (t ),Θ(T )

L =∫ T

0R(X , G,W , t

)dt + S

(Y (X (T ), Θ)

)subject to

dX (t )dt

= f(X , G,W , t

), X (0) = X0

(2)

where∫ T0 R

(X ,G,W , t

)dt is the "running" loss of the continuous-

time dynamics on graph from t = 0 to t = T , and S(Y (X (T ),Θ)) is

the "terminal" loss at time T . By integrating dXdt = f (X ,G,W , t)

over time t from initial state X0, a.k.a. solving the initial valueproblem [7] for this differential equation system, we can get thecontinuous-time network dynamicsX (t) = X (0)+

∫ T0 f (X ,G,W ,τ )dτ

at arbitrary time moment t > 0.Such a formulation can be seen as an optimal control prob-

lem so that the goal becomes to learn the best control param-eters W (t) for differential equation system dX

dt = f (X ,G,W , t)and the best classification parameters Θ for semantic functionY (X (t),Θ) by solving above optimization problem. Different fromtraditional optimal control framework, we model the differentialequation systems dX

dt = f (X ,G,W , t) by graph neural networks.By integrating dX

dt = f (X ,G,W , t) over continuous time, namely

X (t) = X (0) +∫ t0 f

(X ,G,W ,τ

)dτ , we get our graph neural ODE

model. In a dynamical system view, our graph neural ODE can bea time-varying coefficient dynamical system whenW (t) changesover time; or a constant coefficient dynamical system whenW isconstant over time for parameter sharing. It’s worthwhile to re-call that the deep learning methods with L hidden neural layersf∗ are X [L] = fL ◦ ... ◦ f2 ◦ f1(X [0]), which are iterated maps [38]with an integer number of discrete layers and thus can not learncontinuous-time dynamics X (t) at arbitrary time. In contrast, ourgraph neural ODE model X (t) = X (0) +

∫ t0 f

(X ,G,W ,τ

)dτ can

have continuous layers with a real number t depth correspondingto the continuous-time dynamics on graph G. Thus, we can alsointerpret our graph neural ODE model as a continuous-time GNN.

Moreover, to further increase the express ability of our model,we encode the network signal X (t) from the original space to Xh (t)in a hidden space, and learn the dynamics in such a hidden space.Then our model becomes:

argminW (t ),Θ(T )

L =∫ T

0R(X , G,W , t

)dt + S

(Y (X (T ), Θ)

)subject to Xh (t ) = fe

(X (t ),We

), X (0) = X0

dXh (t )dt

= f(Xh, G,Wh, t

),

X (t ) = fd(Xh (t ),Wd

)(3)

where the first constraint transforms X (t) into hidden space Xh (t)through encoding function fe . The second constraint is the govern-ing dynamics in the hidden space. The third constraint decodes thehidden signal back to the original space with decoding functionfd . The design of fe , f , and fd are flexible to be any deep neuralstructures. We name our graph neural ODE (or continuous-timeGNN) model as Neural Dynamics on Complex Networks (NDCN).

We solve the initial value problem (i.e., integrating the differen-tial equation systems over time numerically) by numerical meth-ods (e.g., 1st -order Euler method, high-order method Dormand-Prince DOPRI5 [9], etc.). The numerical methods can approximatecontinuous-time dynamics X (t) = X (0) +

∫ t0 f

(X ,G,W ,τ

)dτ at

arbitrary time t accurately with guaranteed error. Thus, an equiv-alent formulation of Eq.(3) by explicitly solving the initial value

problem of differential equation system is:

argminW (t ),Θ(T )

L =∫ T

0R(X , G,W , t

)dt + S

(Y (X (T ), Θ)

)subject to Xh (t ) = fe

(X (t ),We

), X (0) = X0

Xh (t ) = Xh (0) +∫ t

0f(Xh, G,Wh, τ

)dτ

X (t ) = fd(Xh (t ),Wd

)(4)

A large time t corresponds to "deep" neural networks with thephysical meaning of a long trajectory of dynamics on graphs. Inorder to learn the learnable parametersW∗, we back-propagatethe gradients of the loss function w.r.t the control parameters ∂L

∂W∗over the numerical integration process backwards in an end-to-endmanner, and solve the optimization problem by stochastic gradientdescent methods (e.g., Adam [18]). We will show concrete examplesof above general framework in the following three sections.

4 LEARNING CONTINUOUS-TIME NETWORKDYNAMICS

In this section, we investigate if our NDCNmodel can learn continuous-time dynamics on graphs for both interpolation prediction andextrapolation prediction.

4.1 A Model Instance

𝑿𝑿 𝒍𝒍

𝑋𝑋 𝑙𝑙 + 1 = 𝑋𝑋(𝑙𝑙) + 𝑓𝑓ℎ(𝑋𝑋(𝑙𝑙))

Diffusion

Weight

ReLU

× 𝜹𝜹 = 𝟏𝟏

+

𝑿𝑿 𝒕𝒕

𝑋𝑋 𝑡𝑡 + 𝛿𝛿 = 𝑋𝑋(𝑡𝑡) + �𝑡𝑡

𝑡𝑡+𝛿𝛿

𝑓𝑓(𝑋𝑋,𝐺𝐺,𝑊𝑊, 𝜏𝜏)𝑑𝑑𝜏𝜏

𝑿𝑿𝒉𝒉 𝒕𝒕

𝑋𝑋ℎ 𝑡𝑡 + 𝛿𝛿 = 𝑋𝑋ℎ(𝑡𝑡) + �𝑡𝑡

𝑡𝑡+𝛿𝛿

𝑓𝑓(𝑋𝑋ℎ,𝐺𝐺,𝑊𝑊, 𝜏𝜏)𝑑𝑑𝜏𝜏

𝑿𝑿 𝒕𝒕

Diffusion

Weight

ReLU

× 𝜹𝜹 → 𝟎𝟎+

+ �𝒕𝒕

𝒕𝒕+𝜹𝜹

Diffusion

Weight

ReLU

Wei

ght

Tanh

Wei

ght

�𝒕𝒕

𝒕𝒕+𝜹𝜹

𝑓𝑓𝑒𝑒

𝑓𝑓𝑑𝑑

a) Residual-GNN b) ODE-GNN c) NDCN

𝒅𝒅𝑿𝑿 𝒕𝒕𝒅𝒅𝒕𝒕

𝒅𝒅𝑿𝑿𝒉𝒉 𝒕𝒕𝒅𝒅𝒕𝒕

GCN Layer

Wei

ght

𝑿𝑿 𝒍𝒍 + 𝟏𝟏 − 𝑿𝑿(𝒍𝒍)

Figure 1: Illustration of an NDCN instance. a) a Resid-ual Graph Neural Network (Residual-GNN), b) an OrdinaryDifferential Equation GNN model (ODE-GNN), and c) Aninstance of our Neural Dynamics on Complex Network(NDCN)model. The integer l represents the discrete lth layerand the real number t represents continuous physical time.

We solve the objective function in (3) with an emphasis on run-ning loss and a fixed graph. Without the loss of generality, we useℓ1-norm loss as the running loss R. More concretely, we adopt twofully connected neural layers with a nonlinear hidden layer as theencoding function fe , a graph convolution neural network (GCN)like structure [19] but with a simplified diffusion operator Φ tomodel the instantaneous rate of change of dynamics on graphs inthe hidden space, and a linear decoding function fd for regressiontasks in the original signal space. Thus, our model is:

argminW∗,b∗

L =∫ T

0|X (t ) − ˆX (t ) | dt

subject to Xh (t ) = tanh(X (t )We + be

)W0 + b0, X0

dXh (t )dt

= ReLU(ΦXh (t )W + b

),

X (t ) = Xh (t )Wd + bd

(5)

where ˆX (t) ∈ Rn×d is the supervised dynamic information avail-able at time stamp t (in the semi-supervised case the missing in-formation can be padded by 0). The |·| denotes ℓ1-norm loss (meanelement-wise absolute value difference) between X (t) and ˆX (t) attime t ∈ [0,T ]. We illustrate the neural structures of our model inFigure 1c.

We adopt a linear diffusion operator Φ = D− 12 (D − A)D− 1

2 ∈Rn×n which is the normalized graph Laplacian where A ∈ Rn×n

is the adjacency matrix of the network and D ∈ Rn×n is the corre-sponding node degree matrix. TheW ∈ Rde×de and b ∈ Rn×de areshared parameters (namely, the weights and bias of a linear connec-tion layer) over time t ∈ [0,T ]. TheWe ∈ Rd×de andW0 ∈ Rde×de

are the matrices in linear layers for encoding, whileWd ∈ Rde×d

are for decoding. The be ,b0,b,bd are the biases at the correspond-ing layer. We lean the parametersWe ,W0,W ,Wd ,be ,b0,b,bd fromempirical data so that we can learn X in a data-driven manner. Wedesign the graph neural differential equation system as dX (t )

dt =

ReLU(ΦX (t)W + b) to learn any unknown network dynamics. Wecan regard dX (t )

dt as a single neural layer at time moment t . TheX (t)at arbitrary time t is achieved by integrating dX (t )

dt over time, i.e.,

X (t) = X (0) +∫ t0 ReLU

(ΦX (τ )W + b

)dτ , leading to a continuous-

time graph neural network.

4.2 ExperimentsNetwork DynamicsWe investigate following three continuous-time network dynamics from physics and biology. Let

−−−→xi (t) ∈ Rd×1

be d dimensional features of node i at time t and thus X (t) =[. . . ,

−−−→xi (t), . . . ]T ∈ Rn×d . We show their differential equation sys-

tems in vector form for clarity and implement them in matrix form:

• The heat diffusion dynamics d−−−→xi (t )dt = −ki, j

∑nj=1Ai, j (

−→xi −−→x j ) governed by Newton’s law of cooling [26], which statesthat the rate of heat change of node i is proportional tothe difference of the temperature between node i and itsneighbors with heat capacity matrix A.

• The mutualistic interaction dynamics among species in ecol-

ogy, governed by equation d−−−→xi (t )dt = bi +

−→xi (1−−→xiki)(−→xici − 1)+∑n

j=1Ai, j−→xi−→x j

di+ei−→xi+hj−→x j(For brevity, the operations between

vectors are element-wise). The mutualistic differential equa-tion systems [13] capture the abundance ®xi (t) of species i ,consisting of incoming migration term bi , logistic growthwith population capacity ki [47] and Allee effect [1] withcold-start threshold ci , and mutualistic interaction term withinteraction network A.

• The gene regulatory dynamics governed byMichaelis-Menten

equation d−−−→xi (t )dt = −bi−→xi f +

∑nj=1Ai, j

−→x j h−→x j h+1

where the firstterm models degradation when f = 1 or dimerization whenf = 2, and the second term captures genetic activation tunedby the Hill coefficient h [2, 13].

Complex Networks.We consider following networks: (a) Gridnetwork, where each node is connected with 8 neighbors (as shownin Fig. 2(a)) ; (b) Random network, generated by Erdós and Rényimodel [11] (as shown in Fig. 2(b)); (c) Power-law network, generatedby Albert-Barabási model [3] (as shown in Fig. 2(c)); (d) Small-world network, generated by Watts-Strogatz model [41] (as shownin Fig. 2(d)); and (e) Community network, generated by randompartition model [12] (as shown in Fig. 2(e)).

Visualization. To visualize dynamics on complex networks overtime is not trivial. We first generate a network with n nodes byaforementioned network models. The nodes are re-ordered accord-ing to the community detection method by Newman [29] and eachnode has a unique label from 1 to n. We layout these nodes on a2-dimensional

√n ×

√n grid and each grid point (r , c) ∈ N2 rep-

resents the ith node where i = r√n + c + 1. Thus, nodes’ states

X (t) ∈ Rn×d at time t when d = 1 can be visualized as a scalar fieldfunction X : N2 → R over the grid. Please refer to Appendix A forthe animations of these dynamics on different complex networksover time.

Baselines. To the best of our knowledge, there are no baselinesfor learning continuous-time dynamics on complex networks, andthus we compare the ablation models of NDCN for this task. By in-vestigating ablation models we show that our NDCN is a minimummodel for this task. We keep the loss function same and constructfollowing baselines:

• The model without encoding fe and fd and thus no hiddenspace: dX (t )

dt = ReLU(ΦX (t)W + b) , namely ordinary dif-ferential equation GNN model (ODE-GNN), which learnsthe dynamics in the original signal space X (t) as shown inFig. 1b;

• The model without graph diffusion operator Φ: dXh (t )dt =

ReLU(Xh (t)W +b), i.e., an neural ODE model [8], which canbe thought as a continuous-time version of forward residualneural network (See Fig. 1a and Fig. 1b for the differencebetween residual network and ODE network).

• The model without control parameters, namely weight layerW : dXh (t )

dt = ReLU(ΦXh (t)) which has no linear connectionlayer between t and t+dt (wheredt → 0) and thus indicatinga determined dynamics to spread signals (See Fig. 1c withouta weight layer).

Experimental setup.We generate underlying networks with400 nodes by network models in Sec.4.2 and the illustrations areshown in Fig. 2,3 and 4. We set the initial valueX (0) the same for allthe experiments and thus different dynamics are only due to theirdifferent dynamic rules and underlying networks (See Appendix A).

We irregularly sample 120 snapshots of the continuous-timedynamics { ˆX (t1), ..., ˆX (t120)|0 ≤ t1 < ... < t120 ≤ T } where thetime intervals between t1, ..., t120 are different. We randomly choose80 snapshots from ˆX (t1) to ˆX (t100) for training, the left 20 snapshotsfrom ˆX (t1) to ˆX (t100) for testing the interpolation prediction task.

We use the 20 snapshots from ˆX (t101) to ˆX (t120) for testing theextrapolation prediction task.

We use Dormand-Prince method [9] to get the ground truthdynamics, and use Euler method in the forward process of ourNDCN (More configurations in Appendix B).We evaluate the resultsby ℓ1 loss and normalized ℓ1 loss (normalized by the mean element-wise value of ˆX (t)), and they lead to the same conclusion (We reportnormalized ℓ1 loss here and see Appendix C for ℓ1 loss). Results arethe mean and standard deviation of the loss over 20 independentruns for 3 dynamic laws on 5 different networks by each method.Table 1: Extrapolation of continuous-time network dynam-ics. Our NDCN predicts different continuous-time networkdynamics accurately. Each result is the normalized ℓ1 errorwith standard deviation (in percentage %) from 20 runs for 3dynamics on 5 networks by each method.

Grid Random Power Law Small World Community

HeatDiffusion

No-Encode 29.9 ± 7.3 27.8 ± 5.1 24.9 ± 5.2 24.8 ± 3.2 30.2 ± 4.4No-Graph 30.5 ± 1.7 5.8 ± 1.3 6.8 ± 0.5 10.7 ± 0.6 24.3 ± 3.0No-Control 73.4 ± 14.4 28.2 ± 4.0 25.2 ± 4.3 30.8 ± 4.7 37.1 ± 3.7NDCN 4.1 ± 1.2 4.3 ± 1.6 4.9 ± 0.5 2.5 ± 0.4 4.8 ± 1.0

MutualisticInteraction


GeneRegulation


Table 2: Interpolation of continuous-time network dynam-ics. Our NDCN predicts different continuous-time networkdynamics accurately. Each result is the normalized ℓ1 errorwith standard deviation (in percentage %) from 20 runs for 3dynamics on 5 networks by each method.


HeatDiffusion




GeneRegulation


Results.We visualize the ground-truth and learned dynamicsin Fig. 2,3 and 4, and please see the animations of these networkdynamics in Appendix A.We find that one dynamic lawmay behavequite different on different networks: heat dynamics may graduallydie out to be stable but follow different dynamic patterns in Fig. 2.Gene dynamics are asymptotically stable on grid in Fig. 4a butunstable on random networks in Fig. 4b or community networksin Fig. 4e. Both gene regulation dynamics in Fig. 4c and biologicalmutualistic dynamics in Fig. 3c show very bursty patterns on power-law networks. However, visually speaking, our NDCN learns allthese different network dynamics very well.

The quantitative results of extrapolation and interpolation pre-diction are summarized in Table 1 and Table 2 respectively. Weobserve that our NDCN captures different dynamics on variouscomplex networks accurately and outperforms all the continuous-time baselines by a large margin, indicating that our NDCN po-tentially serves as a minimum model in learning continuous-timedynamics on complex networks.

Grid Random Power Law

Small World

Real NDCN Real NDCN Real NDCN Real NDCN Real NDCN

(a) (b) (c) (d) (e) Community

0

Tim

e

Figure 2: Heat diffusion on different networks. Our NDCN fits the dynamics on different networks accurately. Each of the fivevertical panels (a)-(e) represents the dynamics on one network over physical time. For each network dynamics, we illustratethe sampled ground truth dynamics (left) and the dynamics generated by our NDCN (right) from top to bottom following thedirection of time. Refer to Appendix A for the animations.


Small World


Community

0

Tim

e

(a) (b) (c) (d) (e)

Figure 3: Biological mutualistic interaction on different networks. Our NDCN fits the dynamics on different networks accurately.Refer to Appendix A for the animations.

5 LEARNING STRUCTURED SEQUENCESBesides, we can easily use our NDCN model for learning regularly-sampled structured sequence of network dynamics and predictfuture steps by using 1st -order Euler method with time step 1 inthe forward integration process.

Experimental setup. We regularly sample 100 snapshots ofthe continuous-time network dynamics discussed in the last sec-tion with same time intervals from 0 to T , and denote these struc-tured sequence as { ˆX [1], ..., ˆX [100]}. We use first 80 snapshotsˆX [1], ..., ˆX [80] for training and the left 20 snapshots ˆX [81], ..., ˆX [100]

for testing extrapolation prediction task. The temporal-GNN mod-els are usually used for next few step prediction and can not beused for the interpolation task (say, to predict X [1.23]) directly.

We use 5 and 10 for hidden dimension of GCN and RNN modelsrespectively. We use 1st -order Euler method with time step 1 inthe forward integration process. Other settings are the same asprevious continuous-time dynamics experiment.

Baselines.We compare our model with the temporal-GNNmod-els which are usually combinations of RNNmodels and GNNmodels[17, 28, 36]. We use GCN [19] as a graph structure extractor and useLSTM/GRU/RNN [23] to learn the temporal relationships betweenordered structured sequences. We keep the loss function same andconstruct following baselines:


Small World


Community

0Ti

me

(a) (b) (c) (d) (e)

Figure 4: Gene regulation dynamics on different networks. Our NDCN fits the dynamics on different networks accurately. Referto Appendix A for the animations.

• LSTM-GNN: the temporal-GNN with LSTM cell X [t + 1] =LSTM(GCN (X [t],G)):

xt = ReLU (ΦX [t ]We + be )it = σ (Wiixt + bii +Whiht−1 + bhi )ft = σ (Wi f xt + bi f +Whf ht−1 + bhf )дt = tanh(Wiдxt + biд +Whдht−1 + bhд )ot = σ (Wioxt + bio +Whoht−1 + bho )ct = ft ∗ ct−1 + it ∗ дtht = ot ∗ tanh(ct )

ˆX [t + 1] =Wd ∗ ht + bd

(6)

• GRU-GNN: the temporal-GNN with GRU cell X [t + 1] =GRU (GCN (X [t],G)).:

xt = ReLU (ΦX [t ]We + be )rt = σ (Wir xt + bir +Whrht−1 + bhr )zt = σ (Wizxt + biz +Whzht−1 + bhz )nt = tanh(Winxt + bin + r ∗ (Whnht−1 + bhn ))ht = (1 − zt ) ∗ nt + zt ∗ ht−1

ˆX [t + 1] =Wd ∗ ht + bd

(7)

• RNN-GNN: the temporal-GNN with RNN cell X [t + 1] =RNN (GCN (X [t],G)):

xt = ReLU (ΦX [t ]We + be )ht = tanh(wihxt + bih +whhht−1 + bhh )

ˆX [t + 1] =Wd ∗ ht + bd(8)

Results. We summarize the results of the extrapolation pre-diction of regularly-sampled structured sequence in Table 3. TheGRU-GNN model works well in mutualistic dynamics on randomnetwork and community network. Our NDCN predicts differentdynamics on these complex networks accurately and outperformsthe baselines in almost all the settings. What’s more, our model pre-dicts the structured sequences in a much more succinct way withmuch fewer parameters. The learnable parameters of RNN-GNN,GRU-GNN, LSTM-GNN are 24530, 64770, and 84890 respectively. Incontrast, our NDCN has only 901 parameters, accounting for 3.7%,1.4% , 1.1% of above three temporal-GNN models respectively.

Table 3: Extrapolation prediction for the regularly-sampledstructured sequence. Our NDCN predicts different struc-tured sequences accurately. Each result is the normalized ℓ1error with standard deviation (in percentage %) from 20 runsfor 3 dynamics on 5 networks by each method.


HeatDiffusion

LSTM-GNN 12.8 ± 2.1 21.6 ± 7.7 12.4 ± 5.1 11.6 ± 2.2 13.5 ± 4.2GRU-GNN 11.2 ± 2.2 9.1 ± 2.3 8.8 ± 1.3 9.3 ± 1.7 7.9 ± 0.8RNN-GNN 18.8 ± 5.9 25.0 ± 5.6 18.9 ± 6.5 21.8 ± 3.8 16.1 ± 0.0NDCN 4.3 ± 0.7 4.7 ± 1.7 5.4 ± 0.4 2.7 ± 0.4 5.3 ± 0.7



GeneRegulation


6 NODE SEMI-SUPERVISED CLASSIFICATIONWe investigate the third question, i.e., how to predict the semanticlabels of each node given semi-supervised information? Variousgraph neural networks (GNN) [43] achieve very good performancein graph semi-supervised classification task [19, 44]. Existing GNNsusually adopt an integer number of 1 or 2 hidden layers [19, 40]. Ourframework follows the perspective of a dynamical system: by mod-eling the continuous-time dynamics to spread nodes’ features andgiven labels on graphs, we predict the unknown labels in analogyto predicting the label dynamics at some time T .

6.1 A Model InstanceFollowing the same framework as in Section 3, we propose a simplemodel with the terminal semantic loss S(Y (T )) modeled by thecross-entropy loss for classification task:

argminWe ,be ,Wd ,bd

L =∫ T

0R(t ) dt −

n∑i=1

c∑k=1

Yi,k logYi,k (T )

subject to Xh (0) = tanh(X (0)We + be

)dXh (t )dt

= ReLU(ΦXh (t )

)Y (T ) = softmax(Xh (T )Wd + bd )

(9)

(a) (b) (c)

Figure 5: Our NDCN model captures continuous-time dynamics. Mean classification accuracy of 100 runs over terminal timewhen given a specific α . Insets are the accuracy over the two-dimensional space of terminal time and α . The best accuracy isachieved at a real-number time/depth.Table 4: Statistics for three real-world citation networkdatasets. N, E, D, C represent number of nodes, edges, fea-tures, classes respectively.

Dataset N E D C Train/Valid/Test

Cora 2, 708 5, 429 1, 433 7 140/500/1, 000Citeseer 3, 327 4, 732 3, 703 6 120/500/1, 000Pubmed 19, 717 44, 338 500 3 60/500/1, 000

where the Y ∈ Rn×c is the supervised information of nodes’ labelsand Y (T ) ∈ Rn×c is the label dynamics of nodes at time T ∈ Rwhose element Yi,k (T ) denotes the probability of the node i =1, . . . ,n with label k = 1, . . . , c . We use differential equation systemdX (t )dt = ReLU(ΦX (t)) together with encoding and decoding layers

to spread the features and given labels of nodes on graph overcontinuous time [0,T ], i.e., Xh (T ) = Xh (0) +

∫ T0 ReLU

(ΦXh (t)

)dt .

Compared with the model in Eq.5, we only have a one-shot su-pervised information Y given nodes’ features X . Thus, we modelthe running loss

∫ T0 R(t) dt as the ℓ2-norm regularizer of the learn-

able parameters∫ T0 R(t) dt = λ(|We |22 + |be |22 + |Wd |22 + |bd |22) to

avoid over-fitting. We adopt the diffusion operator Φ = D− 12 (α I +

(1 − α)A)D− 12 where A is the adjacency matrix, D is the degree

matrix and D = αI + (1 − α)D keeps Φ normalized. The parameterα ∈ [0, 1] tunes nodes’ adherence to their previous information ortheir neighbors’ collective opinion. We use it as a hyper-parameterhere for simplicity and we canmake it as a learnable parameter later.The differential equation system dX

dt = ΦX follows the dynamics of

averaging the neighborhood opinion as d−−−→xi (t )dt = α

(1−α )di+α−−−→xi (t) +∑n

j Ai, j1−α√

(1−α )di+α√(1−α )dj+α

−−−→x j (t) for node i . When α = 0, Φ av-

erages the neighbors as normalized random walk, when α = 1, Φcaptures exponential dynamics without network effects, and whenα = 0.5, Φ averages both neighbors and itself as in [19].

6.2 ExperimentsDatasets andBaselines. Weuse standard benchmark datasets, i.e.,citation network Cora, Citeseer and Pubmed, and follow the samefixed split scheme for train, validation, and test as in [19, 39, 44]. Wesummarize the datasets in Table 4. We compare our NDCN modelwith graph convolution network (GCN) [19], attention-based graphneural network (AGNN) [39], and graph attention networks (GAT)[40] with sophisticated attention parameters.

Experimental setup. For the consistency of comparison withprior work, we follow the same experimental setup as [19, 39, 40].

We train our model based on the training datasets and get the ac-curacy of classification results from the test datasets with 1, 000labels as summarized in Table 4. Following hyper-parameter set-tings apply to all the datasets. We set 16 evenly spaced time ticks in[0,T ] and solve the initial value problem of integrating the differ-ential equation systems numerically by DOPRI5 [9]. We train ourmodel for a maximum of 100 epochs using Adam [18] with learningrate 0.01 and ℓ2-norm regularization 0.024. We grid search the bestterminal time T ∈ [0.5, 1.5] and the α ∈ [0, 1]. We use 256 hiddendimension. We report the mean and standard deviation of resultsfor 100 runs in Table 5. It’s worthwhile to emphasize that in ourmodel there is no running control parameters (i.e. linear connectionlayers in GNNs), no dropout (e.g., dropout rate 0.5 in GCN and 0.6in GAT), no early stop, and no concept of layer/network depth (e.g.,2 layers in GCN and GAT).

Results.We summarize the results in Table 5.We find our NDCNoutperforms many state-of-the-art GNN models. Results for thebaselines are taken from [19, 39, 40, 42]. We report the mean andstandard deviation of our results for 100 runs. We get our reportedresults in Table 5 when terminal time T = 1.2 , α = 0 for the Coradataset, T = 1.0, α = 0.8 for the Citeseer dataset, and T = 1.1,α = 0.4 for the Pubmed dataset. We find best accuracy is achievedat a real-number time/depth.Table 5: Test mean accuracy with standard deviation in per-centage (%) over 100 runs. Our NDCN model gives very com-petitive results compared with many GNN models.

Model Cora Citeseer Pubmed

GCN 81.5 70.3 79.0AGNN 83.1 ± 0.1 71.7 ± 0.1 79.9 ± 0.1GAT 83.0 ± 0.7 72.5 ± 0.7 79.0 ± 0.3NDCN 83.3 ± 0.6 73.1 ± 0.6 79.8 ± 0.4

By capturing the continuous-time network dynamics to diffusefeatures and given labels on graphs, our NDCN gives better classi-fication accuracy at terminal time T ∈ R+. Figure 5 plots the meanaccuracy with error bars over terminal time T in the abovemen-tioned α settings (we further plot the accuracy over terminal timeT and α in the insets and Appendix D). We find for all the threedatasets their accuracy curves follow rise and fall patterns aroundthe best terminal time stamps which are real number. Indeed, whenthe terminal time T is too small or too large, the accuracy degener-ates because the features of nodes are in under-diffusion or over-diffusion states. The prior GNNs can only have an discrete numberof layers which can not capture the continuous-time network dy-namics accurately. In contrast, our NDCN captures continuous-timedynamics on graphs in a more fine-grained manner.

7 CONCLUSIONWe propose to combine differential equation systems and graphneural networks to learn continuous-time dynamics on complexnetworks. Our NDCN gives the meaning of physical time and thecontinuous-time network dynamics to the depth and hidden out-puts of GNNs respectively, predicts continuous-time dynamics oncomplex network and regularly-sampled structured sequence ac-curately, and outperforms many GNN models in the node semi-supervised classification task (a one-snapshot case). Our modelpotentially serves as a unified framework to capture the structureand dynamics of complex systems in a data-driven manner. Forfuture work, we try to apply our model to other applications in-cluding molecular dynamics and urban traffics. Codes and datasetsare open-sourced at https://github.com/calvin-zcx/ndcn.

ACKNOWLEDGEMENTThis work is supported by NSF 1716432, 1750326, ONR N00014-18-1-2585,Amazon Web Service (AWS) Machine Learning for Research Award andGoogle Faculty Research Award.

REFERENCES[1] Warder Clyde Allee, Orlando Park, Alfred Edwards Emerson, Thomas Park,

Karl Patterson Schmidt, et al. 1949. Principles of animal ecology. Technical Report.Saunders Company Philadelphia, Pennsylvania, USA.

[2] Uri Alon. 2006. An introduction to systems biology: design principles of biologicalcircuits. Chapman and Hall/CRC.

[3] Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in randomnetworks. science 286, 5439 (1999), 509–512.

[4] Baruch Barzel, Yang-Yu Liu, and Albert-László Barabási. 2015. Constructingminimal models for complex system dynamics. Nature communications (2015).

[5] Amir Bashan, Travis E Gibson, Jonathan Friedman, Vincent J Carey, Scott TWeiss, Elizabeth L Hohmann, and Yang-Yu Liu. 2016. Universality of humanmicrobial dynamics. Nature 534, 7606 (2016), 259.

[6] Martin Benning, Elena Celledoni, Matthias J Ehrhardt, Brynjulf Owren, andCarola-Bibiane Schönlieb. 2019. Deep learning as optimal control problems:models and numerical methods. arXiv preprint arXiv:1904.05657 (2019).

[7] William E Boyce, Richard C DiPrima, and Douglas B Meade. 1992. Elementarydifferential equations and boundary value problems. Vol. 9. Wiley New York.

[8] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. 2018.Neural ordinary differential equations. In Advances in Neural Information Pro-cessing Systems. 6571–6583.

[9] John R Dormand. 1996. Numerical methods for differential equations: a computa-tional approach. Vol. 3. CRC Press.

[10] Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. 2019. Augmented neuralodes. arXiv preprint arXiv:1904.01681 (2019).

[11] P Erdos and A Renyi. 1959. On random graphs I. Publ. Math. Debrecen 6 (1959).[12] Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3-5

(2010), 75–174.[13] Jianxi Gao, Baruch Barzel, and Albert-László Barabási. 2016. Universal resilience

patterns in complex networks. Nature 530, 7590 (2016), 307.[14] Wulfram Gerstner, Werner M Kistler, Richard Naud, and Liam Paninski. 2014.

Neuronal dynamics: From single neurons to networks and models of cognition.Cambridge University Press.

[15] Jiequn Han, Qianxiao Li, et al. 2018. A mean-field optimal control formulation ofdeep learning. arXiv preprint arXiv:1807.01083 (2018).

[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residuallearning for image recognition. In Proceedings of the IEEE conference on computervision and pattern recognition. 770–778.

[17] Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi,Peter Forsyth, and Pascal Poupart. 2019. Relational Representation Learning forDynamic (Knowledge) Graphs: A Survey. arXiv preprint arXiv:1905.11485 (2019).

[18] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti-mization. In ICLR 2015.

[19] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification withGraph Convolutional Networks. In ICLR 2017.

[20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature521, 7553 (2015), 436.

[21] Haoyang Li, Peng Cui, Chengxi Zang, Tianyang Zhang, Wenwu Zhu, and YishiLin. 2019. Fates of Microscopic Social Ecosystems: Keep Alive or Dead?. InProceedings of the 25th ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining. 668–676.

[22] Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper insights into graphconvolutional networks for semi-supervised learning. In Thirty-Second AAAIConference on Artificial Intelligence.

[23] Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical reviewof recurrent neural networks for sequence learning. preprint arXiv:1506.00019(2015).

[24] Yunfei Lu, Linyun Yu, Tianyang Zhang, Chengxi Zang, Peng Cui, ChaomingSong, and Wenwu Zhu. 2018. Collective Human Behavior in Cascading System:Discovery, Modeling and Applications. In 2018 IEEE International Conference onData Mining (ICDM). IEEE, 297–306.

[25] Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. 2017. Beyond finitelayer neural networks: Bridging deep architectures and numerical differentialequations. arXiv preprint arXiv:1710.10121 (2017).

[26] A v Luikov. 2012. Analytical heat diffusion theory. Elsevier.[27] Niall M Mangan, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. 2016.

Inferring biological networks by sparse identification of nonlinear dynamics.IEEE Transactions on Molecular, Biological and Multi-Scale Communications 2, 1(2016), 52–63.

[28] Apurva Narayan and Peter HOâĂŹN Roe. 2018. Learning graph dynamics usingdeep neural networks. IFAC-PapersOnLine 51, 2 (2018), 433–438.

[29] Mark Newman. 2010. Networks: an introduction. Oxford U. press.[30] Mark Newman, Albert-Laszlo Barabasi, and Duncan J Watts. 2011. The structure

and dynamics of networks. Vol. 12. Princeton University Press.[31] Tong Qin, Kailiang Wu, and Dongbin Xiu. 2018. Data driven governing equations

approximation using deep neural networks. arXiv preprint arXiv:1811.05537(2018).

[32] Maziar Raissi. 2018. Deep hidden physics models: Deep learning of nonlinearpartial differential equations. The Journal of Machine Learning Research 19, 1(2018), 932–955.

[33] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. 2018. Multistepneural networks for data-driven discovery of nonlinear dynamical systems. arXivpreprint arXiv:1801.01236 (2018).

[34] Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. 2017.Data-driven discovery of partial differential equations. Science Advances 3, 4(2017), e1602614.

[35] Lars Ruthotto and Eldad Haber. 2018. Deep neural networks motivated by partialdifferential equations. arXiv preprint arXiv:1804.04272 (2018).

[36] Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson.2018. Structured sequence modeling with graph convolutional recurrent net-works. In International Conference on Neural Information Processing. 362–373.

[37] Jean-Jacques E Slotine, Weiping Li, et al. 1991. Applied nonlinear control. Vol. 199.Prentice hall Englewood Cliffs, NJ.

[38] Steven H Strogatz. 2018. Nonlinear Dynamics and Chaos with Student SolutionsManual: With Applications to Physics, Biology, Chemistry, and Engineering.

[39] Kiran K Thekumparampil, Chong Wang, Sewoong Oh, and Li-Jia Li. 2018.Attention-based graph neural network for semi-supervised learning. arXivpreprint arXiv:1803.03735 (2018).

[40] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, PietroLio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprintarXiv:1710.10903 (2017).

[41] Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of âĂŸsmall-worldâĂŹnetworks. nature 393, 6684 (1998), 440.

[42] Felix Wu, Tianyi Zhang, Amauri H. Souza Jr., Christopher Fifty, Tao Yu, andKilian Q. Weinberger. 2019. Simplifying Graph Convolutional Networks. CoRR(2019).

[43] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, andPhilip S Yu. 2019. A comprehensive survey on graph neural networks. arXivpreprint arXiv:1901.00596 (2019).

[44] Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov. 2016. RevisitingSemi-Supervised Learning with Graph Embeddings. In ICML 2016. 40–48.

[45] Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal graph con-volutional networks: A deep learning framework for traffic forecasting. arXivpreprint arXiv:1709.04875 (2017).

[46] Chengxi Zang, Peng Cui, and Christos Faloutsos. 2016. Beyond sigmoids: Thenettide model for social network growth, and its applications. In Proceedings ofthe 22nd ACM SIGKDD international conference on knowledge discovery and datamining. 2015–2024.

[47] Chengxi Zang, Peng Cui, Christos Faloutsos, and Wenwu Zhu. 2018. On PowerLaw Growth of Social Networks. IEEE Transactions on Knowledge and DataEngineering 30, 9 (2018), 1727–1740.

[48] Chengxi Zang, Peng Cui, Chaoming Song, Wenwu Zhu, and Fei Wang. 2019.Uncovering Pattern Formation of Information Flow. In Proceedings of the 25thACM SIGKDD International Conference on Knowledge Discovery & Data Mining.1691–1699.

[49] Chengxi Zang, Peng Cui, Wenwu Zhu, and Fei Wang. 2019. Dynamical Originsof Distribution Functions. In Proceedings of the 25th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining. 469–478.


APPENDIX:A ANIMATIONS OF THE REAL-WORLD

DYNAMICS ON DIFFERENT NETWORKSPlease view the animations of the three real-world dynamics onfive different networks learned by different models at: https://drive.google.com/open?id=1KBl-6Oh7BRxcQNQrPeHuKPPI6lndDa5Y.

B MODEL CONFIGURATIONSModel configurations of learning network dynamics in both continuous-time and regularly-sampled settings. We train our NDCN model byAdam [18]. We choose 20 as the hidden dimension of Xh ∈ Rn×20.We train our model for a maximum of 2000 epochs using Adam [18]with learning rate 0.01. We summarize our ℓ2 regularization param-eter as in Table 6 and Table 7 for Section 4 learning continuous-timenetwork dynamics. We summarize our ℓ2 regularization parameteras in Table 8 for Section 5 learning regularly-sampled dynamics.

Table 6: ℓ2 regularization parameter configurations incontinuous-time extrapolation prediction


HeatDiffusion

No-Encode 1e-3 1e-6 1e-3 1e-3 1e-5No-Graph 1e-3 1e-6 1e-3 1e-3 1e-5No-Control 1e-3 1e-6 1e-3 1e-3 1e-5NDCN 1e-3 1e-6 1e-3 1e-3 1e-5



GeneRegulation

No-Embed 1e-4 1e-4 1e-4 1e-4 1e-4No-Graph 1e-4 1e-4 1e-4 1e-4 1e-4No-Control 1e-4 1e-4 1e-4 1e-4 1e-4NDCN 1e-4 1e-4 1e-4 1e-4 1e-4

Table 7: ℓ2 regularization parameter configurations incontinuous-time interpolation prediction


HeatDiffusion




GeneRegulation

No-Embed 1e-4 1e-4 1e-4 1e-4 1e-4No-Graph 1e-4 1e-4 1e-4 1e-4 1e-4No-Control 1e-4 1e-4 1e-4 1e-4 1e-4NDCN 1e-4 1e-4 1e-4 1e-4 1e-4

Table 8: ℓ2 regularization parameter configurations inregularly-sampled extrapolation prediction


HeatDiffusion

LSTM-GNN 1e-3 1e-3 1e-3 1e-3 1e-3GRU-GNN 1e-3 1e-3 1e-3 1e-3 1e-3RNN-GNN 1e-3 1e-3 1e-3 1e-3 1e-3NDCN 1e-3 1e-6 1e-3 1e-3 1e-5


LSTM-GNN 1e-3 1e-3 1e-3 1e-3 1e-3GRU-GNN 1e-3 1e-3 1e-3 1e-3 1e-3RNN-GNN 1e-3 1e-3 1e-3 1e-3 1e-3NDCN 1e-2 1e-3 1e-4 1e-4 1e-4

GeneRegulation

LSTM-GNN 1e-3 1e-3 1e-3 1e-3 1e-3GRU-GNN 1e-3 1e-3 1e-3 1e-3 1e-3RNN-Control 1e-3 1e-3 1e-3 1e-3 1e-3NDCN 1e-4 1e-4 1e-4 1e-3 1e-3

C RESULTS IN ABSOLUTE ERROR.We show corresponding ℓ1 loss error in Table 9,Table 10 and Ta-ble 11 with respect to the normalized ℓ1 loss error in Section 4learning continuous-time network dynamics and Section 5 learningregularly-sampled dynamics. The same conclusions can be madeas in Table 1,Table 2 and Table 3.Table 9: Continuous-time Extrapolation Prediction. OurNDCN predicts different continuous-time network accu-rately. Each result is the ℓ1 error with standard deviationfrom 20 runs for 3 dynamics on 5 networks for eachmethod.


HeatDiffusion




GeneRegulation


Table 10: Continuous-time Interpolation Prediction. OurNDCN predicts different continuous-time network accu-rately. Each result is the ℓ1 error with standard deviationfrom 20 runs for 3 dynamics on 5 networks for eachmethod.


HeatDiffusion




GeneRegulation


Table 11: Regularly-sampled Extrapolation Prediction. OurNDCN predicts different structured sequences accurately.Each result is the ℓ1 error with standard deviation from 20runs for 3 dynamics on 5 networks for each method.


HeatDiffusion




GeneRegulation


D ACCURACY OVER TERMINAL TIME AND αBy capturing the continuous-time network dynamics, our NDCNgives better classification accuracy at terminal timeT ∈ R+. Indeed,when the terminal time is too small or too large, the accuracydegenerates because the features of nodes are in under-diffusionor over-diffusion states. We plot the mean accuracy of 100 runs ofour NDCN model over different terminal time T and α as shownin the following heatmap plots. we find for all the three datasetstheir accuracy curves follow rise and fall pattern around the bestterminal time.

https://drive.google.com/open?id=1KBl-6Oh7BRxcQNQrPeHuKPPI6lndDa5Y

https://drive.google.com/open?id=1KBl-6Oh7BRxcQNQrPeHuKPPI6lndDa5Y

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5Terminal Time

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Alph

a14.9 15.5 15.1 28.2 53.8 79.5 82.3 83.3 82.2 81.1 80.0

15.7 15.2 15.4 28.8 54.1 79.9 82.6 83.0 82.1 80.7 80.0

14.8 15.5 15.8 29.9 54.5 80.0 82.5 82.9 81.9 80.5 79.7

14.0 15.2 15.5 31.2 55.1 80.3 82.3 82.6 81.7 80.3 79.9

16.0 15.1 16.1 32.1 54.7 80.6 82.6 82.3 81.4 80.3 79.9

14.5 15.4 16.3 34.4 55.0 80.6 82.7 82.2 81.6 80.6 80.0

13.4 13.8 15.7 36.6 54.9 80.5 82.9 82.0 81.7 80.7 80.0

14.9 14.2 16.3 37.3 54.8 80.3 83.0 81.8 81.4 80.6 79.8

14.1 14.0 15.9 39.5 55.4 79.9 82.5 82.0 81.4 80.5 79.9

14.2 14.9 24.0 41.3 68.0 79.9 82.1 82.1 80.9 79.8 79.2

14.1 14.2 30.8 47.2 59.8 58.5 54.4 50.7 50.1 48.1 47.1 15

30

45

60

75

Figure 6: Mean classification accuracy of 100 runs of ourNDCN model over terminal time and α for the Cora datasetin heatmap plot.

Alpha

0.00.2

0.40.6

0.81.0

Termina

l Time

0.50.7

0.91.1

1.31.5

Accu

racy

0.130.210.290.370.440.520.600.680.760.83

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Figure 7: Mean classification accuracy of 100 runs of ourNDCN model over terminal time and α for the Cora datasetin 3D surface plot.

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5Terminal Time

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Alph

a

13.9 13.5 17.1 19.1 39.5 67.2 71.9 71.4 69.8 68.4 66.5

16.0 14.4 16.9 18.7 40.9 67.8 72.2 71.5 70.6 68.9 66.5

16.5 13.1 17.3 18.8 42.2 68.6 72.2 71.5 70.7 68.9 66.4

16.4 15.6 17.6 18.7 43.7 69.5 72.3 71.6 70.9 68.9 66.3

16.5 15.4 18.0 19.0 43.6 69.9 72.2 71.6 71.0 69.2 65.7

16.8 16.7 17.7 19.8 43.5 70.4 72.2 71.6 71.1 69.6 65.5

17.1 16.6 18.5 20.5 43.3 71.3 72.2 71.5 71.2 69.4 65.5

15.4 16.2 16.6 19.8 47.6 72.7 72.4 71.4 70.7 68.5 65.9

17.0 16.9 17.2 18.8 58.3 73.1 72.3 71.1 70.0 68.2 66.0

16.6 16.5 16.6 17.4 63.4 72.0 71.5 69.9 68.3 67.0 64.9

16.4 16.5 16.6 33.3 61.4 60.1 57.1 55.0 53.7 51.4 49.4 15

30

45

60

Figure 8: Mean classification accuracy of 100 runs of ourNDCN model over terminal time and α for the Citeseerdataset.

Alpha

0.00.2

0.40.6

0.81.0

Termina

l Time

0.50.7

0.91.1

1.31.5

Accu

racy

0.130.200.260.330.400.460.530.600.660.73

0.2

0.3

0.4

0.5

0.6

0.7

Figure 9: Mean classification accuracy of 100 runs of ourNDCN model over terminal time and α for the Citeseerdataset in 3D surface plot.

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5Terminal Time

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Alph

a

53.2 54.1 57.9 73.1 77.6 78.7 78.6 79.2 79.2 78.6 78.0

54.0 55.1 63.1 74.7 78.0 79.1 79.2 79.4 79.5 79.0 78.1

54.5 55.9 68.2 75.9 78.2 79.4 79.4 79.5 79.3 79.1 78.2

55.0 56.7 72.3 76.6 78.5 79.4 79.7 79.6 79.1 79.0 78.2

55.8 60.7 74.4 77.1 78.7 79.6 79.8 79.6 79.2 78.8 78.3

56.7 69.2 75.4 77.3 78.7 79.3 79.8 79.7 79.3 78.9 78.2

60.3 72.7 75.8 77.3 78.7 79.3 79.6 79.6 79.1 78.6 77.9

69.0 74.0 76.0 77.4 78.7 79.3 79.6 79.6 78.6 77.9 77.3

72.7 74.6 76.0 77.4 78.7 78.8 79.2 78.8 78.4 77.5 76.8

73.7 74.6 75.7 77.1 78.0 78.5 78.2 77.9 76.9 76.9 76.3

73.3 73.7 73.8 74.0 73.9 73.3 71.4 69.2 68.3 67.0 65.6 55

60

65

70

75

Figure 10: Mean classification accuracy of 100 runs of ourNDCN model over terminal time and α for the Pubmeddataset.

Alpha

0.00.2

0.40.6

0.81.0

Termina

l Time

0.50.7

0.91.1

1.31.5

Accu

racy

0.530.560.590.620.650.680.710.740.770.80

0.55

0.60

0.65

0.70

0.75

Figure 11: Mean classification accuracy of 100 runs of ourNDCN model over terminal time and α for the Pubmeddataset in 3D surface plot.

Neural Dynamics on Complex Networks - arXiv

Documents

Transcript of Neural Dynamics on Complex Networks - arXiv