Relaxed Schrodinger bridges and robust network routing · 2018. 1. 25. · Relaxed Schrodinger...

Relaxed Schrodinger bridges and robust network routing *

Yongxin Chen1, Tryphon Georgiou2, Michele Pavon3 and Allen Tannenbaum4

Abstract

We consider network routing under random link failures with a desired final distribution. We provide a mathe-

matical formulation of a relaxed transport problem where the final distribution only needs to be close to the desired

one. The problem is a maximum entropy problem for path distributions with an extra terminal cost. We show that

the unique solution may be obtained solving a generalized Schrodinger system. An iterative algorithm to compute

the solution is provided. It contracts the Hilbert metric with contraction ratio less than 1/2 leading to extremely fast

convergence.

I. INTRODUCTION

Containing the 2017-18 Southern California wild fires has been a major challenge for CAL FIRE involving

dispatching hundreds of fire engines and thousands of fire fighters including some provided by ten other states.

Efficiently dispatching the fire engines over a long period of time (the Thomas fire, for instance, burned for more

than one month) is a difficult task. The problem can be roughly described as follows: At the initial time t = 0 we

have a certain distribution of fire engines in certain locations (nodes). Within at most N time units, so as to provide

the crew shift, the engines must reach through the available road network the various fire locations (other nodes).

The distribution must guarantee the minimum force necessary to fight each specific fire. Considering the difficulties

and hazards involved in reaching their destination, it seems reasonable to require that the final distribution of the

fire engines be close (rather than equal) to a desired one. Another spec of the routing plan is robustness with respect

to link failures. This could be accomplished by dispatching engines on different routes even when they have the

same target.

*This project was supported by AFOSR grants (FA9550-15-1-0045 and FA9550-17-1-0435), ARO grant (W911NF-17-1-049), grants from

the National Center for Research Resources (P41-RR-013218) and the National Institute of Biomedical Imaging and Bioengineering (P41-EB-

015902), NCI grant (1U24CA18092401A1), NIA grant (R01 AG053991), a grant from the Breast Cancer Research Foundation and by the

University of Padova Research Project CPDA 140897.1Yongxin Chen is with the Department of Electrical and Computer Engineering, Iowa State University, IA 50011, Ames, Iowa, USA

[email protected] Tryphon Georgiou is with the Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA 92697, USA

[email protected] Michele Pavon is with the Dipartimento di Matematica “Tullio Levi-Civita”, Universita di Padova, 35121 Padova, Italy

[email protected] Allen Tannenbaum is with the Departments of Computer Science and Applied Mathematics & Statistics, Stony Brook University, NY 11794,

USA [email protected]

arX

iv:1

801.

0785

2v1

[m

ath.

OC

] 2

4 Ja

n 20

18

In this paper, building on our previous work [20], [21], which deal with the case of a fixed terminal distribution,

we provide a precise mathematical formulation of the above relaxed problem. It is a maximum entropy problem

for probability distributions on the feasible paths with a terminal cost. We study a relaxed version of the usual

Schrodinger bridge problem without a hard constraint on the terminal marginal but with an extra terminal cost. The

solution is obtained by solving iteratively a generalized Schrodinger system. Convergence of the algorithm in the

natural projective metric is established. In [35], which is a sort of relaxation of [16], the problem of optimally steering

a linear stochastic system with a Wasserstein distance terminal cost was studied. In [22] (see also [39]), a regularized

transport problem with very general boundary costs is considered and solved through iterative Schrodinger-Fortet-

Demin-Stephan-Sinkhorn-like algorithms [56], [57], [32], [28], [59]. Although our dynamic problem can be reduced

to a static one of the form considered in [22] (see Section II), employing a general prior measure on the trajectories

has some advantages. Indeed, the static formulation solution does not yield immediate by-product information on

the new transition probabilities and on what paths the optimal mass flow occurs and is therefore less suited for

many network routing applications. Moreover, we want to allow for general prior measures not necessarily of the

Boltzmann’s type considered in the previous work. Finally, we prove convergence of the iterative algorithm in the

Hilbert rather than Thompson metric as it usually provides the best contraction ratio [15, Theorem 3.4], [40].

We model the network through a directed graph and seek to design the routing policy so that the distribution of

the commodity at some prescribed time horizon is close to a desired one. The optimal feedback control suitably

modifies a prior transition mechanism. We also attempt to implicitly obtain other desirable properties of the optimal

policy by suitably choosing a prior measure in a maximum entropy problem for distributions on paths. Robustness

with respect to network failures, namely spreading of the mass as much as the topology of the graph and the final

distribution allow, is accomplished by employing as prior transition the adjacency matrix of the graph. Our intuitive

notion of robustness of the routing policy should not be confused with other notions of robustness concerning

networks which have been put forward and studied, see e.g. [1], [6], [50], [4], [27], [55]. In particular, in [4], [27],

robustness has been defined through a fluctuation-dissipation relation involving the entropy rate. This latter notion

captures relaxation of a process back to equilibrium after a perturbation and has been used to study both financial

and biological networks [53], [54]. This paper is addressed to transportation and data networks problems and does

not concern equilibrium or near equilibrium cases.

The outline of the paper is as follows. In the next section, we define the relaxed transport problem. In Section III,

we state and prove the main result reducing the problem to solving a generalized Schrodinger system. In Section IV,

we review some fundamental concepts and results concerning Hilbert’s projective metric. In Section V, we establish

existence and uniqueness for the generalized Schrodinger system through a contraction mapping principle. Finally,

in Section VI, we outline an iterative algorithm to compute the solution and some extensions of the results.

II. RELAXED SCHRODINGER BRIDGES

Consider a directed, strongly connected aperiodic graph G = (X , E) with vertex set X = {1, 2, . . . , n} and edge

set E ⊆ X × X . We let time vary in T = {0, 1, . . . , N}, and let FPN0 ⊆ XN+1 denote the family of length N ,

feasible paths x = (x0, . . . , xN ), namely paths such that (xi, xi+1) ∈ E for i = 0, 1, . . . , N − 1.

We seek a probability distribution P on FPN0 with prescribed initial probability distribution ν0(·) and terminal

distribution close to νN (·), such that the resulting random evolution is closest to a “prior” measure M on FPN0 in

a suitable sense. The prior law M is induced by the Markovian evolution

µt+1(xt+1) =∑xt∈X

µt(xt)mxtxt+1(t) (1)

with nonnegative distributions µt(·) over X , t ∈ T , and weights mij(t) ≥ 0 for all indices i, j ∈ X and all times.

Moreover, to respect the topology of the graph, mij(t) = 0 for all t whenever (i, j) 6∈ E . Often, but not always,

the matrix

M(t) = [mij(t)]ni,j=1 (2)

does not depend on t. The rows of the transition matrix M(t) do not necessarily sum up to one, so that the “total

transported mass” is not necessarily preserved. This occurs, for instance, when M(t) simply encodes the topological

structure of the network with mij(t) being zero or one, depending on whether a certain link exists at each time

t. It is also possible to take into account the length of the paths leading to solutions which compromise between

speading the mass and transporting on shorter paths, see [20], [21]. The evolution (1) together with the measure

µ0(·), which we assume positive on X , i.e.,

µ0(x) > 0 for all x ∈ X , (3)

induces a measure M on FPN0 as follows. It assigns to a path x = (x0, x1, . . . , xN ) ∈ FPN0 the value

M(x0, x1, . . . , xN ) = µ0(x0)mx0x1(0) · · ·mxN−1xN (N − 1), (4)

and gives rise to a flow of one-time marginals

µt(xt) =∑x` 6=t

M(x0, x1, . . . , xN ), t ∈ T .

We seek a distribution which is closest to the prior M in relative entropy where, for P and Q measures on XN+1,

the relative entropy (divergence, Kullback-Leibler index) D(P‖Q) is

D(P‖Q):=

∑x∈XN+1 P (x) log P (x)

Q(x) , Supp(P ) ⊆ Supp(Q),

+∞, Supp(P ) 6⊆ Supp(Q),

Here, by definition, 0 · log 0 = 0. Naturally, while the value of D(P‖Q) may turn out negative due to miss-match

of scaling (in case Q = M is not a probability measure), the relative entropy is always jointly convex. Moreover,

D(P‖Q)−∑

x∈XN+1

P (x) +∑

x∈XN+1

Q(x) ≥ 0.

Since for probability distributions we have ∑x∈XN+1

P (x) = 1,

minimizing the nonnegative quantity D(P‖Q)−∑x P (x) +

∑xQ(x) over a family of probability distributions P ,

even when the prior Q has a different total mass, is equivalent to minimizing over the same set D(P‖Q). We are

now ready to formalize the problem. Let ν0 and νN be two probability distributions on X and let P(ν0) be the

family of all Markovian probability distributions on XN+1 of the form (4) with initial marginal ν0. Rather than

imposing the desired final marginal νN as in the standard Schrodinger bridge problem, we consider the following

“relaxed problem”:

Problem 1:

minimize J(P ) := D(P‖M) + D(pN‖νN ) (5a)

over {P ∈ P(ν0)}. (5b)

Clearly, we can restrict the minimization to distributions in PS(ν0), namely distributions in P(ν0) such that

Supp(pN ) ⊆ Supp(νN ). (6)

The connection between the dynamic Problem 1 and a static problem such as those considered in [22], can be

obtained as follows. Let P and Q be two probability distributions on XN+1. For x = (x0, x1, . . . , xN ) ∈ XN+1,

consider the multiplicative decomposition

P (x) = Px0,xN (x)p0N (x0, xN ),

where

Px0,xN (x) = P (x|x0 = x0, xn = xN )

and we have assumed that p0N is everywhere positive on X × X , and a similar one for Q. We get

D(P‖Q) =∑x0xN

p0N (x0, xN ) logp0N (x0, xN )

q0N (x0, xN )

+∑

x∈XN+1

Px0,xN (x) logPx0,xN (x)

Qx0,xN (x)p0N (x0, xN ).

This is the sum of two nonnegative quantities. The second becomes zero if and only if Px0,xN (x) = Qx0,xN (x)

for all x ∈ XN+1. Thus, P ∗x0,xN (x) = Qx0,xN (x). Thus, Problem 1 can be reduced to

minimize J(P ) := D(p0N‖m0N ) + D(pN‖νN ) (7)

over {p0N :∑xN

p0N (·, xN ) = ν0(·)}. (8)

This argument extends to the situation where the prior measure mass is not one. We prefer to discuss the original

formulation (5) for the reasons described in the Introduction.

III. MAIN RESULT

We have the following characterization of the solution.

Theorem 1: Assume that the matrix

G := M(N − 1)M(N − 2) · · ·M(1)M(0) = (gij) (9)

has all positive elements gij . Suppose there exist two functions ϕ and ϕ mapping {0, 1, . . . , N} × X into the

nonnegative reals and satisfying the generalized Schrodinger system

ϕ(t, i) =∑j

mij(t)ϕ(t+ 1, j), 0 ≤ t ≤ N − 1, (10a)

ϕ(t+ 1, j) =∑i

mij(t)ϕ(t, i), 0 ≤ t ≤ N − 1, (10b)

ϕ(0, i)ϕ(0, i) = ν0(i), (10c)

ϕ(N, j)2ϕ(N, j) = νN (j). (10d)

For 0 ≤ t ≤ N − 1 and (i, j) ∈ X × X , we define

π∗ij(t) := mij(t)ϕ(t+ 1, j)

ϕ(t, i). (11)

which constitute a family of bona fide transition probabilities. Then, the solution P∗ to Problem 1 is unique and

given by the Markovian distribution

P∗(x0, . . . , xN ) = ν0(x0)π∗x0x1(0) · · ·π∗xN−1xN (N − 1). (12)

Proof 2: Let ϕ(·, ·) be space-time harmonic for the prior transition mechanism, namely satisfy on 0 ≤ t ≤ N − 1

recursion (10a). Observe that since G has all positive elements, ϕ(N, i) and ϕ(0, i) are positive for all i ∈ X . In

particular, it then follows from (10d) that ϕ(N, i) = 0 if and only if νN (i) = 0. Minimizing J(P ) over PS(ν0) is

then equivalent to minimizing over P(ν0) the new index

J ′(P ) := J(P ) +∑x0

logϕ(0, x0)ν0(x0)

−∑xN

logϕ(N, xN )pN (xN ) +∑xN

logϕ(N, xN )pN (xN ), (13)

where pN denotes the marginal of P at time N and we have used the convention 0 · log 0 = 0. Let πij(t) be

the transition probabilities of the measure P ∈ PS(ν0). Then, using the multiplicative decomposition (4) for both

measures we get the representation

D(P‖M) = D(ν0‖µ0)

+

N−1∑k=0

∑xk

D(πxkxk+1(k)‖mxkxk+1

(k))pk(xk). (14)

Since D(ν0‖µ0) is constant over PS(ν0) and by the same calculation as in [51, pp. 7-8], we now get that Problem

1 is equivalent to minimizing over PS(ν0)

J ′′(P ) :=

N−1∑k=0

∑xk

D(πxkxk+1

(k)‖mxkxk+1(k)

×ϕ(k + 1, xk+1)

ϕ(k, xk)

)pk(xk)

+∑xN

log

[pN (xN )ϕ(N, xN )

νN (xN )

]pN (xN ). (15)

We next prove that

π∗ij(t) := mij(t)ϕ(t+ 1, j)

ϕ(t, i)(16)

constitute a family of transition probabilities. Indeed, π∗ij(t) ≥ 0 and, by (10a),∑j

π∗ij(t) =∑j

mij(t)ϕ(t+ 1, j)

ϕ(t, i)=ϕ(t, i)

ϕ(t, i)= 1.

Thus, the first term in the right-hand side of (15) is nonnegative and we can make it equal to zero by choosing as

new transition probabilities precisely (16). Consider now the probabilities p∗(t, ·) defined by the recursion

p∗(t+ 1, j) =∑i

π∗ij(t)p∗(t, i), p∗(0, i) = ν0(i). (17)

Observe that the second term in the right-hand side of (15) becomes zero if

ϕ(N, xN ) =νN (xN )

p∗N (xN )(18)

which is admissible as a boundary condition for (10a) since it is nonnegative and we are only considering

distributions in P(ν0) which satisfy (6). With this choice of ϕ(N, xN ), π∗ij(t) minimize J ′′(P ) over PS(ν0).

Define

ϕ(t, i) :=p∗(t, i)

ϕ(t, i). (19)

Using (17), (16) and (19), we get

ϕ(t+ 1, j) =p∗(t+ 1, j)

ϕ(t+ 1, j)=

∑imij(t)

ϕ(t+1,j)ϕ(t,i) p∗(t, i)

ϕ(t+ 1, j)

=∑i

mij(t)p∗(t, i)

ϕ(t, i)=∑i

π∗ij(t)ϕ(t, i), (20)

namely ϕ(t, i) is space-time co-harmonic satisfying (10b). While (19) alone implies (10c), (18) and (19) imply that

(10d) is verified.

In view of (19), at each time t = 0, 1, . . . , N the marginal p∗t of the solution factorizes as

p∗(t, i) = ϕ(t, i)ϕ(t, i). (21)

The final condition (10d) for the Schrodinger system is different from the standard one, see e.g. [21]. We get from

(10d)

ϕ(N, xN ) =

√νN (xN )

ϕ(N, xN ). (22)

Let ϕ(t) and ϕ(t) denote the column vectors with entries ϕ(t, i) and ϕ(t, i), respectively, with i ∈ X . In matrix

form, (10a), (10b) and (16) read

ϕ(t) = M(t)ϕ(t+ 1), ϕ(t+ 1) = M(t)T ϕ(t), (23a)

and

Π(t) = [πij(t)] = diag(ϕ(t))−1M(t) diag(ϕ(t+ 1)). (23b)

Notice that the condition (18) involves p∗N which is defined through (17). The latter, in turn, depends on the

transition probabilities (16) which require the knowledge of ϕ(t) for t = 0, 1, . . . , N − 1. Thus, it is not clear if

the whole procedure is well-posed. This will be established in Section V by proving that indeed system (10) has a

(unique) solution.

IV. BACKGROUND: HILBERT’S PROJECTIVE METRIC

This metric dates back to 1895 [36]. A crucial contractivity result that permits to establish existence of solutions of

equations on cones (such as the Perron-Frobenius theorem) was proven by Garrett Birkhoff in 1957 [9]. Important

extensions of Birkhoff’s result to nonlinear maps were provided by Bushell [14], [15]. Various other applications

of the Birkhoff-Bushell result have been developed such as to positive integral operators and to positive definite

matrices [15], [42]. More recently, this geometry has proven useful in various problems concerning communication

and computations over networks (see [61] and the work of Sepulchre and collaborators [58], [11], [5] on consensus

in non-commutative spaces and metrics for spectral densities) and in statistical quantum theory [52]. A recent survey

on the applications in analysis is [42]. The use of the Hilbert metric is crucial in the nonlinear Frobenius-Perron

theory [41]. A considerable further extension of the Perron-Frobenius theory beyond linear positive systems and

monotone systems has been recently proposed in [31].

Taking advantage of the Birkhoff-Bushell results on contractivity of linear and nonliner maps on cones, we showed

in [34] that the Schrodinger bridge for Markov chains and quantum channels can be efficiently obtained from the

fixed-point of a map which is contractive in the Hilbert metric. This result extended [33] which deals with scaling

of nonnegative matrices. In [18], it was shown that a similar approach can be taken in the context of diffusion

processes leading to i) a new proof of a classical result on SBP and ii) providing an efficient computational scheme

for both, SBP and OMT. This new computational approach can be effectively employed, for instance, in image

interpolation.

Following [15], we recall some basic concepts and results of this theory.

Let S be a real Banach space and let K be a closed solid cone in S, i.e., K is closed with nonempty interior

intKand is such that K +K ⊆ K, K ∩−K = {0} as well as λK ⊆ K for all λ ≥ 0. Define the partial order

x � y ⇔ y − x ∈ K, x < y ⇔ y − x ∈ intK

and for x, y ∈ K0 := K\{0}, define

M(x, y) := inf {λ | x � λy}

m(x, y) := sup{λ | λy � x}.

Then, the Hilbert metric is defined on K0 by

dH(x, y) := log

(M(x, y)

m(x, y)

).

Strictly speaking, it is a projective metric since it is invariant under scaling by positive constants, i.e., dH(x, y) =

dH(λx, µy) = dH(x, y) for any λ > 0, µ > 0 and x, y ∈ intK. Thus, it is actually a distance between rays. If U

denotes the unit sphere in S, (intK ∩ U, dH) is a metric space.

Example 1: Let K = Rn+ = {x ∈ Rn : xi ≥ 0} be the positive orthant of Rn. Then, for x, y ∈ intRn+, namely with

all positive components,

M(x, y) = maxi{xi/yi}, m(x, y) = min

i{xi/yi},

and

dH(x, y) = log max{xiyj/yixj}.

Another very important example for applications in many diverse areas of statistics, information theory, control,etc.

is the cone of Hermitian, positive semidefinite matrices.

Example 2: Let S = {X = X† ∈ Cn×n}, where † denotes here transposition plus conjugation and, more generally,

adjoint. Let K = {X ∈ S : X ≥ 0} be the positive semidefinite matrices. Then, for X,Y ∈ intK, namely positive

definite, we have

dH(X,Y ) = logλmax

(XY −1

)λmin (XY −1)

= logλmax

(Y −1/2XY −1/2

)λmin

(Y −1/2XY −1/2

) .It is closely connected to the Riemannian (Fisher-information) metric

dR(X,Y ) = ‖ log(Y −1/2XY −1/2

)‖F

=

√√√√ n∑i=1

[log λi(Y −1/2XY −1/2

)]2.

A map E : K → K is called non-negative. It is called positive if E : intK → intK. If E is positive and E(λx) =

λpE(x) for all x ∈ intK and positive λ, E is called positively homogeneous of degree p in intK. For a positive

map E , the projective diameter is befined by

∆(E) := sup{dH(E(x), E(y)) | x, y ∈ intK}

and the contraction ratio by

k(E) := inf{λ :| dH(E(x), E(y)) ≤ λdH(x, y),∀x, y ∈ intK}.

Finally, a map E : S → S is called monotone increasing if x ≤ y implies E(x) ≤ E(y).

Theorem 3 ([15]): Let E be a monotone increasing positive mapping which is positive homogeneous of degree p

in intK. Then the contraction k(E) does not exceed p. In particular, if E is a positive linear mapping, k(E) ≤ 1.

Theorem 4 ([9], [15]): Let E be a positive linear map. Then

k(E) = tanh(1

4∆(E)). (24)

Theorem 5 ([15]): Let E be either

a. a monotone increasing positive mapping which is positive homogeneous of degree p(0 < p < 1) in intK,

or

b. a positive linear mapping with finite projective diameter.

Suppose the metric space Y = (intK ∩ U, dH) is complete. Then, in case (a) there exists a unique x ∈ intK such

that E(x) = x, in case (b) there exists a unique positive eigenvector of E in Y .

This result provides a far-reaching generalization of the celebrated Perron-Frobenius theorem [10]. Notice that in

both Examples 1 and 2, the space Y = (intK ∩ U, dH) is indeed complete [15].

There are other metrics which are contracted by positive monotone maps. For instance, the closely related Thompson

metric [60]

dT (x, y) = log max{M(x, y),m−1(x, y)}.

The Thompson metric is a bona fide metric on K. It has been, for instance, employed in [46], [22], [5].

V. SOLUTION TO THE GENERALIZED SCHRODINGER SYSTEM

Let G = M(N − 1)M(N − 2) · · ·M(1)M(0) and assume that all its elements gij are positive. Let us introduce

the following maps on Rn+:

E : x 7→ y =∑j

gijxj ,

E† : x 7→ y =∑i

gijxi,

D0 : x 7→ y =ν0

x

DN : x 7→ y =νNx

where division of vectors is componentwise1.

Lemma 1: Consider the maps E and E†. We have the following bounds on their contraction ratios:

k(E) = k(E†) = tanh(1

4∆(E)) < 1. (25)

1Our use of the adjoint for the map E is consistent with the standard notation in diffusion processes where the Fokker-Planck (forward)

equation involves the adjoint of the generator appearing in the backward Kolmogorov equation.

Proof 6: Observe that E is a positive linear map and its projective diameter is

∆(E) = sup{dH(E(x), E(y)) | xi > 0, yi > 0}

= sup{log

(gijgk`gi`gkj

)| 1 ≤ i, j, k, ` ≤ n}.

It is finite since all entries gij’s are positive. It now follows from Theorem 4 that its contraction ratio satisfies (25).

Similarly for the adjoint map E†.

Lemma 2:

k(D0) ≤ 1, k(DN ) ≤ 1

Proof 7: See [34, p.033301-10].

Lemma 3: Let R : Rn+ → Rn+ be the map which associates to the vector x with components xi to the vector with

components√xi. Then

k(R) = 1/2. (26)

Proof 8: Let x, y ∈ intRn+. In view of Example1 and using the properties of the square root,

dH(R(x),R(y))=log max{√

(xiyj/yixj)}=(1/2)dH(x, y).

Theorem 9: The composition

C := E† ◦ D0 ◦ E ◦ R ◦ DN (27)

contracts the Hilbert metric with contraction ratio k(C) < (1/2), namely

dH(C(x), C(y)) < (1/2)dH(x, y), ∀x, y ∈ intRn+.

Proof 10: The result follows at once from Lemmas 1, 2, 3.

We have the following result.

Theorem 11: Assume that the matrix G = M(N − 1)M(N − 2) · · ·M(1)M(0) all positive elements gij . Let ν0

and νN be any two probability distributions on X . Then, there exist a unique choice of the vectors ϕ(0), ϕ(N)

with positive entries such that

ϕ(t, i) =∑j

mij(t)ϕ(t+ 1, j), 0 ≤ t ≤ N − 1 (28a)

ϕ(t+ 1, j) =∑i

mij(t)ϕ(t, i), 0 ≤ t ≤ N − 1 (28b)

ϕ(0, x0)ϕ(0, x0) = ν0(x0), (28c)

ϕ(N, xN )2ϕ(N, xN ) = νN (xN ). (28d)

Proof 12: Consider the iteration

(ϕ(N, ·))next = C(ϕ(N, ·)) (29)

Notice that the componentwise divisions of D0 and DN are well defined. Indeed, even when ϕ(0) (ϕ(N)) has zero

entries, ϕ(N) (ϕ(0)) has all positive entries since the elements of G are all positive. Since C is strictly contractive

in the Hilbert metric, the iteration (29) would converge to a ray that is invariant under C. Let φ be any positive

vector on this ray, then

C(φ) = λφ

for some positive number λ. Now let ϕ(N) = λ2φ. Then it is straightforward to verify

C(ϕ(N)) = λC(φ) = λ2φ = ϕ(N). (30)

Moreover, this is the unique vector that satisfies the above condition. Define

ϕ(N) =

√νNϕ(N)

ϕ(0) = E(ϕ(N))

ϕ(0) =ν0

ϕ(0)

and ϕ(t), ϕ(t) according to (28a)-(28b), then clearly these vectors are consistent with the Schrodinger system

(28a)-(28b)-(28c)-(28d).

VI. AN ALGORITHM CONTRACTING HILBERT’S METRIC AND SOME EXTENSIONS

Let 1† = (1, 1, . . . , 1). The iteration in (29) suggests the following algorithm:

a. Set x = x(0) = 1;

b. Set xnext = C(x);

c. Iterate until you reach a fixed point x = C(x) (stopping criterion: |x− Cx| < 10−4);

d. Set ϕ(N) = x;

e. Use

ϕ(N, xN ) =

√νN (xN )

ϕ(N, xN )(31)

to compute ϕ(N) and then (28a) to compute ϕ(t) for t = N − 1, N − 2, . . . , 1, 0;

f. Compute the optimal transition probabilities π∗ij(t) according to (16);

g. The solution to Problem 1 is the time inhomogeneous Markovian distribution (12) with initial marginal

ν0 and transition probabilities π∗ij(t).

The assumption that the elements gij of the matrix G = M(N −1)M(N −2) · · ·M(1)M(0) be all positive can be

relaxed. For instance, if both ν0 and νN are everywhere positive on X , it suffices that M has at least one positive

element in each row and column to guarantee that the componentwise divisions of D0 and DN are well defined.

In that case, Theorems 1 and 9 hold true and the algorithm of this section applies.

Our analysis and algorithm can be generalized to the cost function

D(P‖M) + ηD(pN‖νN ) (32)

for any η ≥ 0. In this case, we only need change (10d) in the Schrodinger system to

ϕ(N, j)η+1η ϕ(N, j) = νN (j)

and (31) to

ϕ(N, xN ) =

(νN (xN )

ϕ(N, xN )

) ηη+1

in the algorithm. The convergence rate is strictly upper bounded by ηη+1 . The parameter η measures the significance

of the penalty term D(pN‖νN ). When η goes to infinity, we recover the traditional Schrodinger bridge. The upper

bound is 1 in this case. On the other hand, when η = 0, the solution is trivial in view of (14). It is the Markov

process with kernel M(t) (assuming that all M(t) are stochastic matrices) and initial distribution ν0. Indeed,ηη+1 = 0 implies ϕ(N, ·) = 1 on X . In view of (28a), we get ϕ(t, i) ≡ 1. This is intuitive and we do not need to

run the algorithm to solve the problem when η = 0.

Example 3: Consider the graph in Figure 1. We seek to transport masses from initial distribution ν1 = δ1 to target

distribution νN = 1/2δ6 + 1/2δ9. The step N is set to be 3 or 4. When N = 3, the evolution of mass distribution

Fig. 1: transport graph

by solving Problem 1 is given by1 0 0 0 0 0 0 0 0

0 0.5865 0.2067 0.2067 0 0 0 0 0

0 0 0 0 0.3798 0 0.2067 0.4135 0

0 0 0 0 0 0.3798 0 0 0.6202

,where the four rows of the matrix show the mass distribution at time step t = 0, 1, 2, 3 respectively. The prior law

M is taken to be the Rulle Bowen random walk [20]. The mass spreads out before reaching nodes 6 and 9. Due

to the soft terminal constraint, the terminal distribution is not equal to νN . When we allow for more steps N = 4,

the mass spreads even more before reassembling at nodes 6, 9, as shown below,1 0 0 0 0 0 0 0 0

0 0.6941 0.2040 0.1020 0 0 0 0 0

0 0 0.1020 0.1020 0.4901 0 0.1020 0.2040 0

0 0 0 0 0 0.3881 0.1020 0.2040 0.3059

0 0 0 0 0 0.2862 0 0 0.7138

.

The terminal distribution is again not equal to νN . However, if we increase the penalty on D(pN‖νN ), then the

difference between pN and νN becomes smaller, as can be seen below, which is the distribution evolution when

η = 10 in (32) 1 0 0 0 0 0 0 0 0

0 0.7679 0.1547 0.0774 0 0 0 0 0

0 0 0.0774 0.0774 0.6132 0 0.0774 0.1547 0

0 0 0 0 0 0.5359 0.0774 0.1547 0.2321

0 0 0 0 0 0.4585 0 0 0.5415

.

VII. FINAL COMMENTS

Since the work of Mikami, Thieullen, Leonard, Cuturi [47], [48], [49], [43], [44], [26], a large number of papers

have appeared where Schrodinger bridge problems are viewed as regularization of the important Optimal Mass

Transport (OMT) problem, see e.g., [8], [17], [18], [19], [45], [2], [22]. This is, of course, interesting and extremely

effective as OMT is computationally challenging [3], [7]. Nevertheless, one should not forget that Schrodinger bridge

problems have at least two other important motivations: The first is Schrodinger’s original “hot gas experiment”

model, namely large deviations of the empirical distribution on paths [30]. The second is a maximum entropy

principle in statistical inference, namely choosing the a posterior distribution so as to make the fewest number of

assumptions about what is beyond the available information. This inference method has been noticeably developed

over the years by Jaynes, Burg, Dempster and Csiszar [37], [38], [12], [13], [29], [23], [24], [25]. It is this last

concept which largely inspired the original approach taken in this paper and in [20], [21] although connections to

OMT were made there. The prior mass distribution on paths may namely simply encode the topological information

of the network or that plus the length of each link and is not necessarily a probability distribution. In this paper, in

particular, we have considered a relaxed version of the problem where the final distribution only need to be close

to a desired one. This has been formalized by adding to the criterion the Kullback-Leibler distance between the

final distribution and the desired one. We have shown that the solution can be otained solving a Schrodinger system

with different terminal condition. An iterative algorithm contracting the Hilbert metric with contraction ratio less

than 1/2 to compute the solution has been provided as well.

REFERENCES

[1] R. Albert, H. Jeong, and A.-L. Barabasi, Error and attack tolerance of complex networks, Nature, 406, pp. 378–382, 2000.

[2] J. Altschuler, J. Weed and P. Rigollet, Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. ArXiv

e-prints, arXiv:1705.09634, 2017.

[3] S. Angenent, S. Haker and A. Tannenbaum, Minimizing flows for the Monge–Kantorovich problem, SIAM J. Math. Anal., 35 (1), 61-97

(2003).

[4] L. Arnold, V. M. Gundlach and L. Demetrius, Evolutionary formalism for products of positive random matrices, The Annals of Applied

Probability, 4, 3, pp. 859–901, 1994.

[5] G. Baggio, A. Ferrante, R. Sepulchre, Finslerian metrics in the cone of spectral densities, ArXiv e-prints, arXiv:1708.02818.

[6] A.-L. Barabasi, Network Science, 2014.

http://arxiv.org/abs/1705.09634


[7] J. Benamou and Y. Brenier, A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem, Numerische

Mathematik, 84 (3), 375-393, (2000).

[8] J. Benamou, G. Carlier, M. Cuturi, L. Nenna and G. Peyre, Iterative Bregman projections for regularized transportation problems, SIAM

Journal on Scientific Computing, 37 (2), (2015), A1111-A1138.

[9] G Birkhoff, Extensions of Jentzch’s theorem, Trans. Amer. Math. Soc., 85, 219-227, 1957.

[10] G Birkhoff, Uniformly semi-primitive multiplicative processes, Transactions of the American Mathematical Society, 104 (1962), pp. 37-51.

[11] S. Bonnabel, A. Astolfi, and R. Sepulchre, Contraction and observer design on cones, in Proc. Decision and Control and European Control

Conference, (CDC-ECC), IEEE, 2011, pp. 12-15.

[12] J. P. Burg, Maximum entropy spectral analysis, in Proc.37th Meet.Society of Exploration Geophysicists, 1967. Reprinted in Modern

Spectrum Analysis, D. G. Childers, Ed. New York: IEEE Press, 1978. pp. 34-41.

[13] J. Burg, D. Luenberger, and D. Wenger, Estimation of Structured Covariance Matrices, Proceedings of the IEEE, 70, pp. 963–974, 1982.

[14] P. Bushell, On the projective contraction ratio for positive linear mappings, Journal of the London Mathematical Society, 2, pp. 256–258,

1973.

[15] P. Bushell, Hilbert’s metric and positive contraction mappings in a Banach space, Archive for Rational Mechanics and Analysis, 52,

330-338, 1973.

[16] Y. Chen, T.T. Georgiou and M. Pavon, Optimal steering of a linear stochastic system to a final probability distribution, Part I, IEEE Trans.

Aut. Control, 61, Issue 5, 1158-1169, 2016.

[17] Y. Chen, T.T. Georgiou and M. Pavon, “On the relation between optimal transport and Schrodinger bridges: A stochastic control viewpoint”,

J. Optim. Theory and Applic., 169 (2), 671-691, 2016.

[18] Y. Chen, T.T. Georgiou and M. Pavon, Entropic and displacement interpolation: a computational approach using the Hilbert metric, SIAM

J. on Applied Mathematics, 76 (6), 2375-2396, 2016.

[19] Y. Chen, T.T. Georgiou and M. Pavon, “Optimal transport over a linear dynamical system”, IEEE Trans. Aut. Control, 62, n. 5, 2137-2152,

2017.

[20] Y. Chen, T.T. Georgiou, M. Pavon and A. Tannenbaum, Robust transport over networks, IEEE Trans. Aut. Control, 62, n.9, 4675-4682,

2017.

[21] Y. Chen, T.T. Georgiou, M. Pavon and A. Tannenbaum, Efficient-robust routing for single commodity network flows, ArXiv e-prints, arXiv:

1701.07625v2, IEEE Trans. Aut. Control, to appear.

[22] L. Chizat, G. Peyr, B. Schmitzer and F.-X. Vialard, Scaling algorithms for unbalanced optimal transport problems, ArXiv e-prints,

arXiv:1607.05816v2, Mathematics of Computation, in press.

[23] I. Csiszar, I-divergence geometry of probability distributions and mimimization problems, Annals of Probability, 3, pp. 146-158, 1975.

[24] I. Csiszar, Sanov property, generalized I-projections, and a conditional limit theorem, Annals of Probability, 12, pp. 768-793, 1984.

[25] I. Csiszar, “Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems,” The Annals of

Statistics, 19(4): 2032-2066, 1991.

[26] M.Cuturi, “Sinkhorn distances: lightspeed computation of pptimal transport,” Advances in Neural Information Processing Systems, 2292-

2300, 2013.


[27] L. Demetrius and T. Manke, Robustness and network evolution: an entropic principle, Physica A: Statistical Mechanics and its Applications,

346, 3, pp. 682–696, 2005.

[28] W. E. Deming and F. F. Stephan, On a least squares adjustment of a sampled frequency table when the expected marginal totals are known,

Ann. Math. Statist., 11, pp. 427-444, 1940.

[29] A. P. Dempster, Covariance selection, Biometrics, 28,157-175, 1972.

[30] H. Follmer, Random fields and diffusion processes, in: Ecole d’Ete de Probabilites de Saint-Flour XV-XVII, edited by P. L. Hennequin,

Lecture Notes in Mathematics, Springer-Verlag, New York, 1988, vol.1362,102-203.

[31] F. Forni and R. Sepulchre, Differentially Positive Systems, IEEE Trans. Aut. Control, 61, Issue: 2, 346 - 359, 2016.

[32] R. Fortet, Resolution d’un systeme d’equations de M. Schrodinger, J. Math. Pure Appl. IX (1940), 83-105.

[33] J. Franklin and J. Lorenz, On the scaling of multidimensional matrices, Linear Algebra and its applications, 114, 717-735, 1989.

[34] T. T. Georgiou and M. Pavon, Positive contraction mappings for classical and quantum Schrodinger systems, J. Math. Phys., 56, 033301

(2015).

[35] A. Halder, and E.D.B. Wendel, Finite Horizon Linear Quadratic Gaussian Density Regulator with Wasserstein Terminal Cost, Proc. of the

2016 American Control Conference, 7249-7254, 2016.

[36] D. Hilbert, Uber die gerade linie als kurzeste verbindung zweier punkte, Mathematische Annalen, 46, 91-96, 1895.

[37] E. T. Jaynes, Information Theory and Statistical Mechanics, Physical Review Series II, 106 (4): 620630, 1957. doi:10.1103/PhysRev.106.620.

MR87305, and Information Theory and Statistical Mechanics II, Physical Review Series II, 108 (2): 171190, 1957.

doi:10.1103/PhysRev.108.171. MR96414.

[38] E. T. Jaynes. On the rationale of maximum-entropy methods. Proceedings of the IEEE, 70(9):939–952, Sept. 1982.

[39] Johan Karlsson, and Axel Ringh, Generalized Sinkhorn iterations for regularizing inverse problems using optimal mass transport, SIAM

Journal on Imaging Sciences 10(4) : 1935-1962, 2017

[40] E. Kohlberg and J. W. Pratt, The contraction mapping approach to the Perron-Frobenius theory: Why Hilbert?s metric?, Mathematics of

Operations Research, 7, no. 2, 198-210, 1982.

[41] B. Lemmens and R. Nussbaum, Nonlinear Perron-Frobenius Theory, no. 189, Cambridge University Press, 2012.

[42] B. Lemmens and R. Nussbaum,, Birkhoff’s version of Hilbert’s metric and its applications in analysis, ArXiv e-prints, arXiv:1304.7921v1,

(2013), Handbook of Hilbert geometry, Chapter 10, G. Besson, A. Papadopoulos and M. Troyanov Eds., European Mathematical Society

Publishing House, Zurich, 275-306,2014.

[43] C. Leonard, From the Schrodinger problem to the Monge-Kantorovich problem, J. Funct. Anal., 2012, 262, 1879-1920.

[44] C. Leonard, A survey of the Schroedinger problem and some of its connections with optimal transport, Discrete Contin. Dyn. Syst. A,

2014, 34 (4): 1533-1574.

[45] W. Li, P. Yin and S Osher, Computations of optimal transport distance with Fisher information regularization, ArXiv e-prints,

arXiv:1704.04605.

[46] C. Liverani and M. P. Wojtkowski, Generalization of the Hilbert metric to the space of positive definite matrices, Pacific J. Math., 166,

no. 2, 339-355, 1994.



[47] T. Mikami, Monge’s problem with a quadratic cost by the zero-noise limit of h-path processes, Probab. Theory Relat. Fields, 129, (2004),

245-260.

[48] T. Mikami and M. Thieullen, Duality theorem for the stochastic optimal control problem., Stoch. Proc. Appl., 116, 1815?1835 (2006).

[49] T. Mikami and M. Thieullen, Optimal Transportation Problem by Stochastic Optimal Control, SIAM Journal of Control and Optimization,

47, N. 3, 1127-1139 (2008).

[50] N. K. Olver, Robust Network Design. Ph.D. Dissertation, Department of Mathematics and Statistics, McGill University, 2010.

[51] M. Pavon and F. Ticozzi, Discrete-time classical and quantum Markovian evolutions: Maximum entropy problems on path space, J. Math.

Phys., 51, 042104-042125 (2010).

[52] D. Reeb, M. J. Kastoryano, and M. M. Wolf, Hilbert’s projective metric in quantum information theory, J. Math. Phys., 52 (2011),

p. 082201.

[53] R.Sandhu, T. Georgiou, E. Reznik, L. Zhu, I. Kolesov, Y. Senbabaoglu, and A. Tannenbaum1, “Graph curvature for differentiating cancer

networks,” Scientific Reports (Nature), vol. 5, 12323; doi: 10.1038/srep12323 (2015).

[54] R. Sandhu, T. Georgiou, and A. Tannenbaum, “Ricci curvature: An economic indicator for market fragility and systemic risk,” Science

Advances, vol. 2, doi: 10.1126/sciadv.1501495, 2016.

[55] K. Savla, G. Como, and M. A. Dahleh, “Robust network routing under cascading failures,” IEEE Trans. on Network Science and

Engineering, vol. 1, no. 1, 53-66, 2014.

[56] E. Schrodinger, Uber die Umkehrung der Naturgesetze, Sitzungsberichte der Preuss Akad. Wissen. Berlin, Phys. Math. Klasse (1931),

144-153.

[57] E. Schrodinger, Sur la theorie relativiste de l’electron et l’interpretation de la mecanique quantique, Ann. Inst. H. Poincare 2, 269, 1932.

[58] R. Sepulchre, A. Sarlette, and P. Rouchon, Consensus in non-commutative spaces, in Proc. 49th IEEE-CDC, pp. 6596-6601, 2010.

[59] R. Sinkhorn, A relationship between arbitrary positive matrices and doubly stochastic matrices, Ann. Math. Statist., 35 (1964), 876-879.

[60] A. Thompson, On certain contraction mappings in a partially ordered vector space, Proc. of the American Mathematical Society, 14, no.

3, 438-443, 1963.

[61] J. Tsitsiklis, D. Bertsekas, and M. Athans, Distributed asynchronous deterministic and stochastic gradient optimization algorithms, IEEE

Transactions on Automatic Control, 31, pp. 803–812, 1986.

Relaxed Schrodinger bridges and robust network routing · 2018. 1. 25. · Relaxed Schrodinger...

Documents

Transcript of Relaxed Schrodinger bridges and robust network routing · 2018. 1. 25. · Relaxed Schrodinger...