Adaptive trading strategies across liquidity pools

arX

iv:2

008.

0780

7v1

[q-

fin.

TR

] 1

8 A

ug 2

020

Adaptive trading strategies across liquidity pools∗

Bastien Baldacci† Iuliia Manziuk

‡

August 19, 2020

Abstract

In this article, we provide a flexible framework for optimal trading in an asset listed on dif-ferent venues. We take into account the dependencies between the imbalance and spread of thevenues, and allow for partial execution of limit orders at different limits as well as market orders.We present a Bayesian update of the model parameters to take into account possibly changingmarket conditions and propose extensions to include short/long trading signals, market impactor hidden liquidity. To solve the stochastic control problem of the trader we apply the finitedifference method and also develop a deep reinforcement learning algorithm allowing to considermore complex settings.

Keywords: cross-platform trading, optimal trading, Bayesian learning, adaptive trading strate-gies, deep reinforcement learning, stochastic control.

1 Introduction

A vast majority of quantitative trading strategies are based on cross-platform arbitrage. These strate-gies involve cross-listed stocks, that are assets traded on two or more liquidity venues. In [21], theauthors investigate the prices of cross-listed stocks in different venues and provide evidence of pricedeviations for the majority of the 600 cross-listed stocks they studied. In [3], the authors highlightmispricing that can exist between a domestic stock and its ADR (American Deposit Receipt) coun-terpart. The study conducted in [26] for US-UK cross-listed stocks shows that markets for cross-listedsecurities are among the most heavily arbitraged. In particular, higher potential of arbitrage can beexploited for cross-listed stocks from emerging markets, see [24].

Usually, the trader builds an execution curve targeting, for example, an Implementation Shortfallor volume-weighted average price (VWAP). Then, he buys or sells shares of the asset following the

∗This work benefits from the financial support of the Chaires Analytics and Models for Regulation, Financial Riskand Finance and Sustainable Development. Bastien Baldacci gratefully acknowledge the financial support of the ERCGrant 679836 Staqamof. The authors would like to thank Joffrey Derchu (Ecole Polytechnique), Mathieu Rosenbaum(Ecole Polytechnique) and Olivier Guéant (Université Paris-1 Panthéon-Sorbonne) for numerous fruitful discussions.In particular, Mathieu Rosenbaum deserves warm thanks for his careful reading of the paper and his many suggestionsto improve its quality.

†École Polytechnique, CMAP, 91128, Palaiseau, France, [email protected].‡École Polytechnique, CMAP, 91128, Palaiseau, France, [email protected].

1

http://arxiv.org/abs/2008.07807v1

execution curve by sending limit and market orders to the different venues. But how to find thebest splitting of orders between the venues? The trader splits his orders depending mainly on theimbalance and spread of the different venues, which of course depend on each other. For example, ahigher imbalance on the ask side of one venue can indicate a potential imminent price change and alower probability to have an ask limit order executed, so it may be more profitable to send the order toanother venue. The problem of optimal trading across liquidity pools has been treated, for instance,in [1, 13, 22, 23]. In [1], the authors develop a dynamic estimate of the hidden liquidity present onseveral venues and use this information to make order splitting decisions (Smart Order Routing).The paper [13] solve a general order placement problem and provide explicit solution for the optimalsplit between limit and market orders on different venues. Finally, in [22, 23], the authors build astochastic algorithm to find the optimal splitting between liquidity pools, including dark venues.

Building a good model for optimal trading cross-listed assets requires to take into account the cross-dependence between the imbalance and spread of each venue, as well as the probability and theproportion of execution of limit orders. However, the quality of the model will mainly rely on theestimation of the market parameters. If one assumes constant parameters over the trading period,he believes in the quality of his parameters’ estimation. In this case, his strategy is not robust tochanges in price dynamics or the platforms’ behavior. For example, the trading period may occurwhen another market participant is executing a buy (or sell) metaorder on one or several venues. Thisparticipant will consume the vast majority of the liquidity available on the sell (or buy) side of thosevenues. If the trader does not adjust his market parameters, the algorithm will keep sending limitorders on the platform where the metaorder is being split, without an execution opportunity. Thatis why it is essential to update model parameter estimations with new information obtained fromobserving the market dynamics. Here, we treat the updates in a Bayesian manner.

In this paper, we formulate the problem of a trader dealing in a stock, listed on several venues, byplacing limit and market orders. Trader’s activity can be formulated as a stochastic control problem.The controls are the splitting of volumes between limit and market orders on each venue, and thelimits chosen by the trader. The optima are obtained from a classical Hamilton-Jacobi-Bellman (HJB)quasi-variational inequality, which, for a parsimonious model, can be easily solved by grid methods.

Then we propose a Bayesian update of each market parameter, decoupled from the control prob-lem. One of the advantages of this method is the simplicity of the formulae for each parameter’sposterior estimate. In particular, we do not need to use Markov chain Monte-Carlo. This method’schoice comes from the fact that updating the market parameters continuously in the control problemincreases the number of state variables drastically, leading to high computation time. A continuousBayesian update would require first to compute the conditional expectation of the value function giventhe market parameters and then to integrate it over their posterior distribution. This last integrationbrings multiple non-linearities in the equation, making this fully Bayesian control problem hard tosolve numerically.

The proposed Bayesian procedure is easier to apply in practice: we divide the trading period into timeslices of about several seconds up to a few minutes long, assuming that market conditions do not varydrastically throughout the slice. For each slice, we keep track of all the market events. Specifically,on each venue, we count the number of executed limit orders, the executed proportions (for example,

2

50% or 100% of the order volume) given the couple spread-imbalance on each venue at the time ofthe execution. We also keep track of the price dynamics. At the end of each slice, we update ourview on the market parameters and recompute the optimal trading strategy for the next slice. Thisapplication of the Bayesian updates on slices of execution is time-inconsistent. However, we see itas a first step toward a more integrated Bayesian learning framework for cross-listed trading. Byusing finite difference schemes or deep reinforcement learning methods (which could also be mixed)for high-dimensional PDE resolution, we can compute in a couple of minutes the optimal tradingstrategy on a slice given new market conditions.

This paper aims at giving a useful and applicable model for practitioners who work on cross-tradingstrategies. For a quantitative firm, the control model is flexible enough to reproduce the main stylizedfacts about the market and to design trading strategies taking into account real signals. Moreover,the procedure for Bayesian updates of market parameters in the control problem enables to reevaluatethe optimal strategy when the market conditions may differ from the prior empirical estimation ofthe trader.

The article has the following structure: in Section 2, we describe the framework for cross-platformtrading and formulate the trader’s optimization problem. In Section 2.2, we derive the Hamilton-Jacobi-Bellman quasi-variational inequality (HJBQVI) associated with the trader’s optimal tradingproblem. We introduce a change of variable to reduce the dimensionality of the problem and prove theexistence and uniqueness of the viscosity solution of the initial HJBQVI in Appendix A. In Section 3,we first define the conjugate Bayesian update of all market parameters. Then, we describe the updateprocedure in practice and its link to the control problem of the trader. Section 4 is dedicated to someextensions of the model and their impact on the dimensionality of the resulting HJBQVI. We devoteSection 5 to numerical results, for the sake of clarity of interpretations considered in the case of limitorders only. Finally in Appendix Appendix B.2, we present an application of the Bayesian update ofthe market parameters to the problem of an OTC market maker.

2 Optimal trading on several liquidity pools

The model presented in this section is a generalization of the classic optimal trading framework,developed notably in [5, 17, 19, 25] and in the reference books [12, 16], to the case of several liquidityvenues.

2.1 Framework

We consider a trader acting on N liquidity platforms operating with limit order books over timeinterval [0, T ]. He trades continuously on each venue by sending limit and market orders. For n ∈{1, . . . , N}, the n-th venue is characterized by the following continuous-time Markov chains:

• the bid-ask spread process (ψnt )t∈[0,T ] taking values in the state space ψn

= {δn, . . . , Jδn},

• the imbalance process (Int )t∈[0,T ] taking values in the state space In

= {In1 , . . . , InK},

where J,K ∈ N denote the number of possible spreads and imbalances respectively and δn stands forthe tick size of the n-th venue. We define the sets Ψ = {Ψ1, . . . ,Ψ#Ψ}, I = {I1, . . . , I#I} of disjointintervals, representing different market regimes of interest in terms of spreads and imbalances.

3

Example 2.1. Assume for all n ∈ {1, . . . , N} that δn = δ. The set Ψ ={

δ, {2δ, 3δ}, {4δ, 5δ}}

denotes

three spread regimes: low (one tick), medium (two or three ticks), and high (four or five ticks).

Example 2.2. Assume for all n ∈ {1, . . . , N} and k ∈ {1, . . . , K} that Ink = Ik. In this case

the set I ={

[−1,−0.66], (−0.66,−0.33], (−0.33, 0.33], (0.33, 0.66], (0.66, 1]}

denotes five regimes of

imbalance: low (−33% to 33%), medium on the ask (resp. bid) from 33% to 66% (resp. from −66%to −33%) and high on the ask (resp. bid) from 66% to 100% (resp. from −100% to −66%).

Whenever the spread and the imbalance of each venue enter the state k = (kψ,kI) ∈ K whereK =

∏Nn=1 ψ

n×∏Nn=1 I

n, they remain in this state for a time exponentially distributed with mean 1

νk

.We define a transition matrix P = (pkk′), (k,k′) ∈ K, and corresponding intensity vectors ν = (νk)T

k.

We assume that pkk = 0, meaning that we cannot come to the same state twice in a row. Theinfinitesimal generator of the processes can be written as

rkk′ = νkpkk′ if k 6= k′

rkk = −∑

k′ 6=k

rkk′ = −νk, otherwise.

Remark 2.3. This general formulation allows us a full coupling between the spread and imbalanceof all venues. If one wants a more parsimonious model, the following simplifications could be made.When the spread (imbalance) of the n-th venue enters the state k, it remains there for an exponentiallydistributed time with mean 1

νn,ψ

k

( 1

νn,I

k

for the imbalance). Therefore, we define a transition matrix

Pn,ψ = (pn,ψkk′ ), n ∈ {1, . . . , N}, (k, k′) ∈ ψn

such that pn,ψkk = 0, and corresponding intensity vectorsνn,ψ = (νn,ψ1 , . . . , νn,ψK )T. Similarly we define a transition matrix Pn,I for the imbalance. Then, theinfinitesimal generator of the processes can be written as

rn,ψkk′ = νn,ψk pn,ψkk′ if k 6= k′

rn,ψkk = −∑

k′ 6=krn,ψkk′ = −νn,ψk otherwise.

This framework will be used in Section 5, where we present the numerical results.

In what follows, the trader designs his strategy on the ask side of the market (optimal liquidationproblem). The extension to trading on both sides of the market is straightforward and does not causean increase in the problem’s dimensionality.

The number of, possibly partially, filled ask orders in the venue n is modeled by a Cox process denotedby Nn, n ∈ {1, . . . , N} with intensities λn

(

ψt, It, pnt , ℓt

)

where pnt ∈ Qnψ represent the limit at which

the trader sends a limit order of size ℓnt , and

Qnψ = {0, 1} if ψn = δn, and {−1, 0, 1} otherwise,

A ={

(ℓt)t∈[0,T ],F − predictable, s.t for all t ∈ [0, T ], 0 ≤N∑

n=1

ℓnt ≤ qt

}

,

where (qt)t∈[0,T ] is defined in Equation (2.1). Practically for n ∈ {1, . . . , N}, when the spread is equalto the tick size, the trader can post at the first best limit (pn = 0) or the second best limit (if pn = 1).

4

When the spread is equal to two ticks or more, the trader can either create a new best limit (pn = −1)or post at the best or the second best limit as previously. The arrival intensity of a buy market orderat time t on the venue n ∈ {1, . . . , N} at the limit p ∈ Qn

ψ, given a couple (ψt, It) = m of spread andimbalance on each venue, is equal to λn,m,p > 0. When the trader posts limit orders of volume ℓnton the n-th venue for n ∈ {1, . . . , N}, the probability that it is executed is equal to fλ(ℓt), wherefλ(·) ∈ [0, 1] is a continuously differentiable function, decreasing with respect to each of its coordinate.Therefore, the arrival intensity of an ask market order filling the buy limit order of the trader on then-th venue at the limit pnt , given spread and imbalance (ψt, It) is a multi-regime function defined by

λn(ψt, It, pnt , ℓt) = fλ(ℓt)∑

m∈M,p∈Qnψ

λn,m,p1{(ψt,It)∈m,pnt =p},

where M = ΨN × IN . Moreover, we allow for partial execution, the fact of which we representby random variables ǫnt ∈ [0, 1]. The proportion of executed volume for limit orders in each venuedepends on the spread and the imbalance in all N venues, as well as the volume and the limit ofthe order chosen by the trader. We assume a categorical distribution with R > 0 different executionproportions ωr, r ∈ {1, . . . , R} for each venue with P(ǫnt = ωr) = ρn,r(ψt, It, pnt , ℓt), where

ρn,r(ψt, It, pnt , ℓt) = f ρ(ℓt)∑

m∈M,p∈Qnψ

ρn,m,p,r1{(ψt,It)∈m,pnt =p},

where f ρ(·) is a continuously differentiable function, decreasing with respect to each of its coordinate.

Remark 2.4. The estimation of this kind of parameters for executed proportions can be quite intricatein practice. To simplify, one can assume that ρn,r(ψt, It, pnt , ℓt) = ρn,r ∈ [0, 1]. In practice, this meansthat there are different execution proportion probabilities inherent by each venue, depending on itstoxicity.

Finally, we allow for the execution of market orders (denoted by a point process (Jnt )t∈[0,T ]) on eachvenue of size (mn

t )t∈[0,T ] ∈ [0, m] where m > 0 and Jnt = Jnt− + 1. We assume that market orders arealways fully executed.

The cash process of the trader at time t ∈ [0, T ] is

dXt =N∑

n=1

(

ℓnt(

St +ψnt2

+ pnt δn)

ǫnt dNnt +mn

t

(

St −ψnt2

)

dJnt

)

,

where

dSt = µdt+ σdWt, (µ, σ) ∈ R × R+,

is the dynamics of the mid-price process. The inventory process of the trader at time t ∈ [0, T ] isdefined by

qt = q0 −N∑

n=1

∫ t

0

(

ℓnuǫnudN

nu +

∫ t

0mnudJ

nu

)

. (2.1)

We also assume that the trader has a pre-computed trading curve q⋆ that he wants to follow (Almgren-Chriss trading curve or VWAP strategy, for example). Then the trader’s optimization problem is

sup(p,ℓ,m)∈Qψ×A×[0,m]N

E

[

XT + qTST −∫ T

0g(qt − q⋆t )dt

]

, (2.2)

where the function g penalizes deviation from the pre-computed optimal trading curve.

5

2.2 The Hamilton-Jacobi-Bellman quasi-variational inequality

The HJBQVI associated with the optimization problem of the trader (2.2) is the following:

0 = min

− ∂tu(t, x, q, S, ψ, I) + g(q − q⋆t ) − µ∂Su−12σ2∂SSu

−∑

k∈Kr(ψ,I),(kψ ,kI )

(

u(t, x, q, S,kψ,kI) − u(t, x, q, S, ψ, I))

− supp∈Qψ,ℓ∈A

N∑

n=1

λn(ψ, I, pn, ℓ)E[

u(

t, x+ ǫnℓn(S +ψn

2+ pnδn), q − ℓnǫn, S, ψ, I

)

− u(t, x, q, S, ψ, I)]

;N∑

n=1

u(t, x, q, S, ψ, I) − supmn∈[0,m]

u(

t, x+mn(S −ψn

2), q −mn, S, ψ, I

)

,

(2.3)

with terminal condition

u(t, x, q, S, ψ, I) = x+ qS,

where ψ = (ψ1, . . . , ψN), I = (I1, . . . , IN). The expectation in (2.3) is taken over the variables ǫn, n ∈{1, . . . , N}. We prove the following theorem in Appendix A:

Theorem 1. There exists a unique viscosity solution to the HJBQVI (2.3), which coincides with thevalue function of the control problem of the trader (2.1).

The proof of existence and uniqueness of the viscosity solution mainly relies on adaptations of thetheory of the second order viscosity solution with jumps, see [9], for example.

The value function has to be linear with respect to the cash process and the mark-to-market valueof the trader’s inventory due to the form of the terminal condition. Therefore we use the followingansatz:

u(t, x, q, S, ψ, I) = x+ qS + v(t, q, ψ, I).

The HJBQVI then becomes a system of ODEs with 2N + 1 state variables:

0 = min

− ∂tv(t, q, ψ, I) + g(q − q⋆t ) − µq

−∑

k∈Kr(ψ,I),(kψ ,kI)

(

v(t, q, S,kψ,kI) − v(t, q, S, ψ, I))


N∑

n=1


ǫnℓn(ψn

2+ pnδn) + v

(

t, q − ℓnǫn, ψ, I)

− v(t, q, ψ, I)]

;N∑

n=1

v(t, q, ψ, I) − supmn∈[0,m]

−mnψn

2+ v

(

t, q −mn, ψ, I)

,

(2.4)

with terminal condition v(T, q, ψ, I) = 0.

6

Conditionally on the market parameters such as the transition matrix of both the spread and theimbalance processes, the drift and volatility of the underlying asset and the execution proportionprobabilities, solving Equation (2.4) is done using simple finite difference schemes and the optimalsplitting of volumes as well as the optimal limits can be computed in advance.

If one want to incorporate directly Bayesian learning of the parameters in the control problem, theresult would be a very high number of state variables, which makes the problem intractable in practice.For example, if we want to update continuously the value of the processes λn for n ∈ {1, . . . , N} weneed to add the counting processes (Nn

t )t∈[0,T ] to the state variables, which increases the dimensionof the HJBQVI (2.4) by N . What we propose in the following section is a practical way to updatethe market parameters according to trader’s observations in a Bayesian way. This method, which isperformed separately from the optimization procedure, allows to update, at the end of a slice, thetrading strategy according to changing market conditions.

3 Adaptive trading strategies with Bayesian update

The framework presented in the above section allows to choose generic parametric forms for the statevariables prior distributions (transition matrix of spreads and imbalances, intensities of orders’ arrivalon each venue) suitable to the use of conjugate Bayesian updates.

3.1 Bayesian update of the model parameters

In this section, we present the conjugate Bayesian update of the market parameters and how to choosethe prior distributions.

3.1.1 Update of the intensities

Let us recall the form of the intensities for counterpart market orders’ arrival:


m∈M,p∈Qnψ

λn,m,p1{(ψt,It)∈m,pnt =p}.

In the vast majority of optimal liquidation models, the probability of execution λn,m,p is estimatedempirically. We propose to put a prior law Γ(αn,m,p, βn,m,p) on the arrival rate, and to update a priorbelief at the end of each slice of execution. The parameters αn,m,p, βn,m,p are chosen by the traderaccording to his vision of the market before he starts to trade. Up to time t ∈ [0, T ] the traderobserves the processes

Nn,m,pt =

∫ t

01{(ψs,Is)∈m,pns=p}dN

ns ,

which represent the number of executed orders on each venue for every spread-imbalance zone m.The posterior distribution of λn,m,p for n ∈ {1, . . . , N} is then given by

λn,m,p|Nn,m,pt ∼ Γ

(

αn,m,p +Nn,m,pt , βn,m,p +

∫ t

0fλ(ℓs)ds

)

,

7

and at time t, our best estimate of the filling ratio becomes

λn,m,p(t, Nn,m,pt , ℓt) = E

[

λn,m,p|Nn,m,pt

]

=αn,m,p +Nn,m,p

t

βn,m,p +∫ t

0 fλ(ℓs)ds

.

The posterior estimate of the intensity λn(ψt, It, pnt , ℓt) becomes


m∈M,p∈Qnψ,

αn,m,p +Nn,m,pt

βn,m,p +∫ t

0 fλ(ℓs)ds

1{(ψt,It)∈m,pnt =p}.

As the convergence of the prior parameters toward the true market specification follows from thecentral limit theorem, the convergence rate equals to 1√

omwhere om is the number of observations of

filled limit orders on the spread-imbalance zone m. If we consider even a quite parsimonious model,for example two venues, two regimes of spread and three regimes of imbalance, we have #M = 36different zones. This means that we need a sufficiently large amount of observations (large number ofexecuted orders) to get an accurate approximation of the market behavior.

If the trader anticipates that the number of observations he will have is not adequate to obtaina suitable approximation of the “true” market parameters (in the case of a mid to low frequencystrategy with only a few number of trades throughout the day), he might choose at the beginningthe couples (αn,m,p, βn,m,p) such that αn,m,p

βn,m,p >>Nn,m,pt

∫ t

0fλ(ℓs)ds

. That way, his prior will not be sensitive

to a small number of observations, and with sufficient number of observations the prior will have lessinfluence and the estimation will be less biased.

3.1.2 Update of the executed proportion

We propose to use the Dirichlet prior distribution on the executed proportion parameters so thatρn,m,p ∼ Dirichlet(αǫ,n,m,p) where αǫ,n,m,p = (αǫ,n,m,p,1, . . . , αǫ,n,m,p,R) for all (n,m, p, r) ∈ {1, . . . , N}×M × Qψ × {1, . . . , R}. Given observations of ǫnt , the executed proportion parameters have Dirichletposterior distribution

ρn,m,p ∼ Dirichlet(αǫ,n,m,p + cn,m,pt ),

where cn,m,pt = (cn,m,p,1

t , . . . , cn,m,p,Rt ) and cn,m,p,r

t =∑

s≤t 1{ǫns=ωr ,(ψs,Is)∈m,pns=p,Nns −Nn

s−=1} is the number

of observations before time t in zone m for a limit p in the venue n. Therefore, the ǫit have thefollowing posterior distribution:

ρn,r(ψt, It, pnt , ℓt) = f ρ(ℓt)∑

m∈M,p∈Qψ

αǫ,n,m,p,r + cn,m,p,rt

∑Rr=1(αǫ,n,m,p,r + cn,m,p,r

t )1{(ψt,It)∈m,pn=p}.

This Bayesian update is linked to the filling of limit orders of the trader: the proportion executed isupdated only if the limit order is (partially) executed. If one chooses the parametrization independentof the spread-imbalance zones and the order volume, that is execution proportion depends only onthe venue, the speed of convergence is much faster as the same amount of gathered information isused to update a much smaller number of parameters. Using this more parsimonious parametrizationthe trader can rely on the observations more than on his prior.

8

3.1.3 Update of the characteristics of the venues

We observe the states of the Markov chains ψtd , Itd, d ∈ {0, . . . , D} and the times td of the D > 0transitions. The likelihood function for the spread and the imbalance processes is

L(P, ν|ψt≤tD , It≤tD) =D∏

d=1

νtd−1exp

(

− νtd−1(td − td−1)

)

p(ψtd−1,Itd−1)(ψtd,Itd)

∝∏

k∈K(νk)nk· exp(−νkTk)

∏

k′∈K(pkk′)nkk′ ,

where nkk′ is the number of observed transitions from state k to k′ for (k,k′) ∈ K, Tk is the totaltime spent in state k, and nk· =

∑

k′∈K nkk′ is the total number of transitions out of state k.

Given independent prior distributions for P, ν, the posterior distributions will also be independent.We can carry out Bayesian inference separately on the probability matrix and the intensity vectorsof the Markov chains. We assume the following priors:

νk ∼ Γ(ak, bk),

pk = (pkk′)k′∈K ∼ Dirichlet(αk), where αk = (αkk′)k′∈K.

Given these conjugate priors, our best estimators of νk,pk are

νk =ak + nk· − 1bk + Tk

,

pkk′ =αkk′ + nkk′

∑

l 6=k(αkl + nkl).

Then the posterior transition matrix is

rkk′ = νkpkk′, k 6= k′,

rkk = −νk.

This update aims at finding the “true” behavior of the imbalance and spread processes of each venue.This is of particular importance if an event (for instance, an announcement or news) happens in themarket. More specifically, if one event occurs in a particular platform (if a metaorder is executed inone specific platform, for example), this helps to discriminate one venue from the others and to redirectthe orders to the less toxic liquidity platforms. Given the large number of observations (transitionsfrom one state of imbalance or spread to another occur fast), the trader does not necessarily need tobe confident about his prior distributions.

Remark 3.1. If one wants to use a more parsimonious model as in Remark 2.3, the same methodologyapplies. In particular for k ∈ ψ

n, we assume the following prior:

νn,ψk ∼ Γ(an,ψk , bn,ψk ),

pn,ψk = (pn,ψkk′ )k′∈ψn ∼ Dirichlet(αn,ψk ), where αn,ψk = (αn,ψkk′ )k′∈ψn .

Given these conjugates priors, our best estimators of νn,ψk ,pn,ψk are

νn,ψk =an,ψk + nn,ψk· − 1

bn,ψk + T n,ψk

,

9

pn,ψkk′ =αkk′ + nn,ψkk′

∑

l 6=k(αkl + nn,ψkl ).

The posterior transition matrix is given by

rn,ψkk′ = νn,ψk pn,ψkk′ , k 6= k′,

rn,ψkk = −νn,ψk .

Similar formulae apply for νn,Ik ,pn,Ik .

3.1.4 Update of the mid-price

We recall that the price process has the following dynamics:

dSt = µdt+ σdWt,

so that (St − S0|µ, σ) ∼ N (µt, σ2t). We assume that the couple (µ, σ2) follows a Normal-Inverse-Gamma prior distribution NIG(µ0, ν, α

s, βs), where (µ0, ν, αs, βs) ∈ R× R

3+. Therefore the posterior

distribution has the following form:

(µ, σ2|St − S0) ∼ NIG((St − S0) + µ0ν

ν + t, ν + t, αs +

t

2, βs +

tν

ν + t

(St−S0

t− µ0)2

2

)

.

Given our observations of the stock price up to time t, the best approximation of the drift and volatilityare given by

µ(t, St) = E[µ|St − S0] =(St − S0) + µ0ν

ν + t, σ2(t, St) = E[σ2|St − S0] =

βs + tνν+t

(St−S0

t−µ0)2

2

αs + t2

− 1.

The volatility σ does not appear explicitly in the HJBQVI (2.3). However, it is taken into accountwhen the trader computes his trading curve q⋆.

In the case where the trader is confident with his estimation of σ, one can use a Normal priordistribution on µ such that µ ∼ N (µ0, ν

2). Then, the best approximation of the drift is given by

µ(t, St) = E[µ|St − S0] =µ0σ

2 + ν2(St − S0)σ2 + ν2t

. (3.1)

If the trader firmly believes in the a priori parameter estimation, he can set ν close to 0 so that hemostly relies on his prior. On the contrary, if he sets ν high enough, his estimation comes mostlyfrom market information. Given the large amount of data coming from the market (each time stepcorresponding to one new observation), convergence to the real value of the drift is fast.

Remark 3.2. One can argue about the use of a frequentist estimator of the model parameters, whichwould actually lead to quite similar formulae. However the original problem, that is continuous updateof market parameters in the control problem, is of Bayesian nature. Moreover, in our approach, theformulae for posterior distribution of market parameters are as explicit as in the frequentist approach.

10

3.2 Algorithm description

We now present the use of the Bayesian updates in order to obtain adaptive trading strategies inpractice. We emphasize that the procedure is decoupled from the optimization problem (2.4), sothat we do not perform Bayesian optimization but rather a Bayesian update of the parameters of anoptimization problem.

Number of time steps is an important parameter of the optimization problem because its choice is atrade-off between computation time and computation precision. To address this problem, we use thetrading algorithm with fixed market parameters over a short period of time (a couple of seconds upto a few minutes), which we call a slice. Let us consider V > 0 slices Tv = [Tv, Tv+1], v = 0, . . . ,V − 1,such that T0 = 0, TV = T . We define for each slice v ∈ V a set of market parameters

θmv = (r, ρn, λn,m,p, µ, σ){n∈{1,...,N},m∈M,p∈Qψ}.

At each time slice v ∈ {0,V − 1} starting from v = 0 we perform the following algorithm:

1. Take the best estimation of market parameters θmv from the prior distribution for the currentslice v.

2. Compute the optimal trading strategy on Tv using the set of parameters θmv .

3. Observe market events during the current slice (executions, changes of the state).

4. At Tv+1, update the parameters θmv+1 following the Bayes rules described in Section 3.

To summarize, we use the output of the control model (the optimal volumes and limits in each venue)over a slice of execution and then run the model again with the updated market parameters. Thismethod, which is clearly time inconsistent, is common practice when one applies optimal control withonline parameter estimation, see for example [7]. We now present some possible extensions of thepresented model.

4 Model extensions

In this section we describe different potential model extensions and their impact on the problem’sdimensionality.

4.1 Extension 1: Incorporation of signals in the price process

4.1.1 Short-term price signals

The two main sources of signals at the microstructural level are the imbalance and the bid-ask spread.Therefore, one can assume a parametric dependence f short(ψt, It) of the price process on these twosources, such that the price process becomes

dSt =(

µ+ f short(ψt, It))

dt+ σdWt.

In a modified stochastic control problem the term µq in the HJBQVI is replaced by (µ+ f short(ψ, I)),which causes no increase in the dimensionality of the state process.

11

4.1.2 Mid/Long term and path-dependent price signals

When trading on longer time horizon, one can incorporate mid- or long-term signals such as Bollingerbands, moving average or cointegration ratio. For example, consider a signal taking into account themoving average and the maximum of the price process St, that is

St =1t

∫ t

0Stdt, S⋆t = max

s≤tSs.

The triplet (St, S⋆t , St) is Markovian. Therefore, we can add a long term signal f long(St, St, S⋆t ) intothe asset’s drift:

dSt =(

µ+ f long(St, St, S⋆t ))

dt+ σdWt.

The HJBQVI then becomes:

0 = min

− ∂tu(t, q, S, S, S⋆, ψ, I) + g(q − q⋆t ) −(

µ+ f long(S, S, S⋆))

∂Su−S − S

t∂Su−

12σ2∂SSu

−∑


(

u(t, q, S, S, S⋆,kψ,kI) − u(t, q, S, S, S⋆, ψ, I))


N∑

n=1


ǫnℓn(S +ψn

2+ pnδn) + u

(

t, q − ℓnǫn, S, S, S⋆, ψ, I)

− u(t, q, S, S, S⋆, ψ, I)]

;N∑

n=1

u(t, q, S, S, S⋆, ψ, I) − supmn∈[0,m]

mn(S −ψn

2) + u

(

t, q −mn, S, S, S⋆, ψ, I)

,

for S ≤ S⋆, with ∂Su = 0 for S = S⋆. To obtain this equation we just use a change of variablev(t, x, q, S, S, S⋆, ψ, I) = x+ u(t, q, S, S, S⋆, ψ, I), linear with respect to the cash process Xt. We endup with a 2N + 4 dimensional HJBQVI, that we can still solve using our deep reinforcement learningalgorithm (but unlikely with finite differences).

More generally, adding a path-dependent state variable that gives information on the price trend addsone dimension to the HJBQVI (in the example above (St, S⋆t , St) add one dimension each).

4.2 Extension 2: Market impact

So far we assumed no market impact on the price process. It is common knowledge that cost ofmarket impact can cut down a large proportion of the trading strategy’s profit. Therefore, we canuse a simple permanent-temporary market impact model, inspired by [2].

The impacted mid-price process can be modeled as follows:

dSt =(

µ+ h(ℓt))

dt+ σdWt +N∑

n=1

(

ξn,l(t, ℓnt )dNnt + ξn,m(t, ℓnt )dJnt

)

,

where the functions h, ξn,l, ξn,m are the permanent and temporary market impact functions. Follow-ing [15], we assume linear permanent market impact, that is

h(ℓt) =N∑

n=1

κn,perℓnt , κn,per > 0 for all n ∈ {1, . . . , N}.

12

For the temporary market impact, we can follow the well-known “square-root law” and set

ξn,l(t, ℓnt ) = κn,l(ℓnt )γn,l

, ξn,m(t, ℓnt ) = κn,m(ℓnt )γn,m

,

where κn,l, κn,m, γn,l, γn,m > 0 and γn,l, γn,m ≈ 1/2. On the other hand, in order to take into accountthe transient part of the impact, we can set the following form for St:

St = S0 +∫ t

0µ+ h(ℓs)ds+ σWt +

N∑

n=1

∫ t

0ξn,l(t− s)ξn,l(ℓis)dN

ns + ξn,m(t− s)ξn,m(ℓs)dJns , (4.1)

where ξn,l, ξn,m are decreasing kernels, and ξn,l, ξn,m are decreasing functions of the posted volume. Itis well known that by taking an exponentially decreasing kernel, Equation (4.1) admits a Markovianrepresentation as the couples

(

Nnt ,∫ t

0 ξn,{l,m}(t − s)dNn

s

)

t∈[0,T ]are Markovian. Practically, this will

add 2N dimension to the HJBQVI.

Functions h, ξn,l, ξn,m could also be approximated by neural networks. Determination of a cross-impactfunction between liquidity pools can lead to possible arbitrage detection across liquidity venues.

4.3 Extension 3: Hidden liquidity

Hidden liquidity represents a great proportion of the liquidity especially in the US markets, see forexample [21]. Therefore, if one wants to design trading tactics for assets cross-listed in a Europeanand an American market, taking into account the hidden part of the liquidity is crucial.

Assume that the n-th venue is a US liquidity pool. Borrowing the notations of [4], we denote byHn the hidden liquidity of the n-th venue at the first limit of the order book. Therefore, the corre-sponding imbalance process represented by the continuous-time Markov chain In can be rewritten as

Nn,a,mt −Nn,b,m

t

Nn,a,mt +Nn,b,m

t +2Hn, where Nn,b,m

t , Nn,a,mt are the bid and ask market order flow processes on the n-th

venue. Empirical estimation of the prior parameters for the transition matrix of In have to take intoaccount this additional term in the imbalance processes. Furthermore, incorporating the imbalanceprocess with hidden liquidity into trading signals allows to detect arbitrage opportunities betweendifferent venues. This does not increase the dimensionality of Equation (2.3).

5 Numerical results

5.1 Global parameters

We take the example of a trader acting on a stock cross-listed on 2 different venues (N = 2), withthe following global parameters:

• ψn

= {δ, 2δ}: the processes (ψnt )t∈[0,T ] can take two values, which correspond to a low or highspread regimes, and the tick size is δ = 0.05.

• In

= {−0.5, 0, 0.5}: the processes (Int )t∈[0,T ] can take three values, which correspond to a nega-tive, neutral or positive imbalance regime.

13

• R = 2, (ω1, ω2) = (0.5, 1): the processes (ǫnt )t∈[0,T ] can take two values, which correspond to atotal or half-execution of the posted volume (ℓnt )t∈[0,T ].

• q0 = 5 × 104: initial inventory of the trader.

• Tv = [v, v+ ∆v], where ∆v = 1 min, which means that each slice lasts one minute, with V = 10slices and T = 10 min.

• ∆t = 0.1: we take 10 time steps in each slice, that is the agent takes 10 trading decisions duringeach slice.

The pre-computed trading curve is borrowed from an implementation shortfall execution using marketorders, that is:

q⋆t = q0

sinh(

√

γσ2V2η

(T − t))

sinh(

√

γσ2V2η

T) .

with the following set of parameters

• η = 0.1: coefficient of quadratic costs.

• V = 1 × 108: average market volume.

• γ = 1 × 10−6: risk aversion of the trader using a CARA utility function.

• σ = 0.05: volatility of the asset.

• fλ(ℓt) = exp(−κ∑Nn=1 ℓ

nt ) with κ = 2.5 × 10−5: sensitivity of the execution with respect to the

total volume posted.

• f ρ(ℓt) = 1: no sensitivity of the executed proportion with respect to the total volume posted.

For this numerical experiment for the sake of clarity of interpretations we consider the trader sendingonly limit orders.

5.2 Numerical methods

5.2.1 Finite differences

To find optimal strategy for limit orders we consider the following equation:

0 = − ∂tv(t, q, ψ, I) + g(q − q⋆t ) − µq

−N∑

n=1

J∑

j=1

rn,ψψ,jδ(

v(t, q, ψ−njδ , I) − v(t, q, ψ, I)

)

−N∑

n=1

K∑

k=1

rn,II,Ik

(

v(t, q, ψ, I−nIk

) − v(t, q, ψ, I))


N∑

n=1


ǫnℓn(ψn

2+ pnδn) + v

(


− v(t, q, ψ, I)]

,

14

where

ψ−njδ = (ψ1, . . . , ψn−1, jδ, ψn+1, . . . ), I−n

Ik= (I1, . . . , In−1, Ik, I

n+1, . . . ).

In order to apply the finite difference method we introduce the discretization of time and state space.For inventories we have Q = {q1 = 0 < . . . < q#Q = q0}. Time discretization in the slice isT = {t0 = 0 < t1 = t0 + ∆t < . . . < t#T = ∆v}. We also discretize the order volumes the trader cansend L = {l1 = 0 < . . . < l#L = q0}.

Using the first difference for the value function derivative with respect to time we can rewrite theabove equation as ∀i ∈ {0, . . . ,#T − 1}, ∀q ∈ Q, ∀(ψ, I) ∈ M

v(ti+1, q, ψ, I) =v(ti, q, ψ, I) − ∆t

g(q − q⋆t ) − µq

−N∑

n=1

J∑

j=1

rn,ψψ,jδ(

v(t, q, ψ−njδ , I) − v(t, q, ψ, I)

)

−N∑

n=1

K∑

k=1

rn,II,Ik

(

v(t, q, ψ, I−nIk

) − v(t, q, ψ, I))

− supp∈{−1,0,1}N ,ℓ∈LN

N∑

n=1


ǫnℓn(ψn

2+ pnδn) + v

(


− v(t, q, ψ, I)]

,

with terminal condition v(T, q, ψ, I) = 0.

In terms of calculations the most demanding part is obviously the search of the supremum which isneeded to be performed on the dimension 3N × #Q × #LN × #M for each time step. From whatfollows that finite differences can be applied to solve the problem of optimal orders posting for thestock cross-listed in N = 2 venues with reasonable precision and calculation time. However, if weintroduce more venues finite differences are not going to be any more efficient because the complexityis growing exponentially.

For our numerical example, we used the discretization with #Q = 101 and #L = 51 which assuresthe calculation time (on a simple PC) around 1min for the whole slice.

5.2.2 Neural networks

In this section, we briefly introduce the method using neural networks to solve HJB equations. In thispaper, we used a method which can be referred to as Actor-Critic method to approximate optimalcontrols and corresponding value function for the problem. Applications of this approach have shownto be fruitful, especially when we talk about equations in high dimension, more elaborate descriptionof the method can be found for example in [6, 8, 18, 20].

The core of this approach is to represent the strategy of the trader with a neural network as well asthe corresponding value function. Then one needs to formalize the target functions for both neuralnetworks and to perform the gradient descent on the parameters (weights) of these networks. Thisprocedure needs to be done for every time step, and so one ends up with 2#T networks.

15

Let us start from the description of the value function approximation. We consider the neural networkstaking as an input the spreads and the imbalances in the venues of interest and the inventory of thetrader giving as an output the value function at this point. As in the finite difference method we solveour problem backward, starting from t#T−1 = ∆v−∆t, because the value function at the end of the sliceis known from the terminal condition. To calculate the value function at time ti, ∀i ∈ {0, . . . ,#T−1}we use the minimization of the mean-squared error between values given by the neural network andthe target values calculated with the use of the value function approximation for time ti+1 and thenetwork for the controls at the current step. Let us assume that we have the controls ℓ∗, p∗ (obtainedvia neural networks, for example) for time ti, then the target for the value function can be found as

vtarget(ti−1, q, ψ,I) = v[θvi ](ti, q, ψ, I) + ∆t

(

g(q − q⋆t ) − µq

−N∑

n=1

J∑

j=1

rn,ψψ,jδ(

v[θvi ](t, q, ψ−njδ , I) − v[θvi ](t, q, ψ, I)

)

−N∑

n=1

K∑

k=1

rn,II,Ik

(

v[θvi ](t, q, ψ, I−nIk

) − v[θvi ](t, q, ψ, I))

−N∑

n=1

λn(ψ, I, p∗n, ℓ∗)E[

ǫnℓ∗n(ψn

2+ p∗nδn) + v[θvi ]

(

t, q − ℓ∗nǫn, ψ, I)

− v[θvi ](t, q, ψ, I)]

)

,

with v[θv#T](t#T, q, ψ, I) = 0 and where [θvi ] stands for the weights of the neural network for the valuefunction at time ti.

The trader’s inventory is of continuous nature, however, spread and imbalance are categorical, so weneed to verify if we should use some special techniques to ensure better fitting in this case.

0 10000 20000 30000 40000 50000

−80000

−60000

−40000

−20000

0

20000

Target for value function for random spreads and imbalancestarget value

Figure 1: Target value function for increasing inventory and random market states.

Let us see first in Figure 1 the example of the target value function of the trader for q ∈ [0, q0] atdifferent spreads and imbalances. We see considerable changes in the value function level dependingon the market state which we would like to capture by our approximation.

Now, let us compare the fitting of the value function parametrization taking as inputs raw spreadand imbalance values with the parametrization working with encoded values of the spread and theimbalance. Here we are going to use the so-called one-hot encoding for categorical variables, whichconsists in the representation of different values of the variable by a one-hot vector eiψ ∈ {0, 1}#Ψ

16

for the spread and eiI ∈ {0, 1}#I for the imbalance. And ei (both for eiψ and eiI) are such that thateij = 0, ∀j 6= i, and eii = 1 otherwise.

0 10000 20000 30000 40000 50000

−80000

−60000

−40000

−20000

0

20000

Fitting of target value function with raw values for statestarget valuepredicted value

Figure 2: Comparison of the target value with ap-proximation continuous in spread and imbalance.

0 10000 20000 30000 40000 50000

−80000

−60000

−40000

−20000

0

20000

Fitting of target value function with encoded values for statestarget valuepredicted value

Figure 3: Comparison of the target value withapproximation discrete in spread and imbalance.

In Figures 2 and 3 we see the comparison between values predicted by two parametrizations withtarget values for the same number of learning epochs. There is a considerable gain in precision whenthe parametrization takes into account the categorical nature of market states. Therefore we applyit for both value function network approximation and the strategy neural network approximation.

Now, let us describe the learning procedure for the strategy. First of all, the inputs of the strategynetwork are the same as for the value function network, i.e. the trader’s inventory, spreads and imbal-ances for both venues. As an output, we need to have volumes of the orders and limits on which thetrader needs to send his orders. Volumes to send to each venue are bounded by the current inventorybecause we do not want the trader to execute more shares than he possesses. Limits should equal−1, 0, or 1, but as soon as we want to use the tools of automatic differentiation, we need to representthem by differentiable function. The softmax activation function serves well to this purpose, so werepresent the limits for each venue by the probabilities to send an order to each precise limit. Inpractice, the trader can choose the maximum of the three to perform his action.

The optimization criterium used for the strategy neural network is the function under supremum fromthe HJBQVI (2.4), with limit probabilities taken into account (let us denote them by P(p = a), fora ∈ {−1, 0, 1}) we need to maximize with respect to θℓi , i ∈ {0, . . . ,#T − 1}:

N∑

n=1

∑

a∈{−1,0,1}P[θℓi ](p

n = a)λn(ψ, I, a, ℓ[θℓi ])E[

ǫnℓ[θℓi ]n

(

ψn

2+ aδn

)

+ v[θℓi+1](

ti, q − ℓ[θℓi ]nǫn, ψ, I

)

− v[θℓi+1](ti, q, ψ, I)]

,

where θℓi stand for the weights of the neural network of controls at time ti. So we want to maximize thisfunction for all possible values of market states and inventories. To avoid the dimensionality trap weneed to optimize this function on some subset of possible values, which we are going to draw randomly.

When optimizing neural networks approximations, it is important to normalize the data, to have ifpossible a universal set of hyperparameters. First of all, the inventory entering as an input of the

17

value function neural network and of the strategy neural network is normalized by q0 to always stayin [0, 1]. Also, we are going to learn not the target value function itself, but the target value functionnormalized by q0, which sufficiently reduces the order of values. For strategy network, we are going tolearn the proportion of the inventory to be sent and not the volume itself. And finally, we can noticethat for high inventories the difference between value functions (which are quadratic in the inventory)in the supremum can become much more important than the profit of the trader coming from the tick(which is not more than linear in inventory). This fact can hinder us from finding optimal values forthe limit to which the trader should send his order, especially for small inventories. We normalize thevalues of the optimized function for different inventories to make small inventories more importantby multiplying all values by 1

q. However, this latter normalization is used when we optimize over the

part of the strategy responsible for the limits only, leaving volume updates untouched.

To summarize in Figures 4 and 5, we presented the structures of the neural networks used to representthe approximators for the strategy and the value function. Another feature worth mentioning hereis the separation of market state and inventory inputs for some layers, both for the strategy and thevalue function. This allows capturing features of the market state independently of the inventory.Also, we separated some layers preceding the outputs of the strategy network to be able to performthe learning process with different learning rates for volumes and limits of limit orders.

spreads: InputLayer

spreads_and_imbalances: Concatenate all_inputs: Concatenate

imbalances: InputLayer inventory: InputLayer

hidden_layer_1.1: Dense hidden_layer_1.2: Dense


outputs_2.1_and_2.2: Concatenate


hidden_layer_4.1: Dense

volumes: Dense

limits_venue_1: Denselimits_venue_2: Dense

Figure 4: Neural network structure for thetrader’s strategy.

spreads: InputLayer

spreads_and_imbalances: Concatenate all_inputs: Concatenate

imbalances: InputLayer inventory: InputLayer


outputs_1.1_and_1.2: Concatenate

hidden_layer_2: Dense

value_function: Dense

Figure 5: Neural network structure for thevalue function.

While the finite difference schemes must complete the entire recalculation of values for the wholegrid every time the trader wants to adapt his strategy using the updated market parameters, neuralnetworks can be adapted progressively, starting from some pre-trained strategy, for example, the onecorresponding to the previous parameters. In practice, a pre-trained model can be reused for differentproblem settings due to normalization. Therefore a long and elaborate training procedure shouldbe done only once. The resulting model can be ameliorated by small adjustment trainings which

18

take only 1 minute on the simplest instance of the AWS platform (2CPU, no GPU), and have greatspeed-up potential when performed on more complex infrastructures.

5.3 Two identical venues

We assume that the trader is confident about his estimation of σ. Therefore he uses Bayesian updateonly on the drift µ of the asset. The venues share identical parameters, which will be inferred by thetrader through time.

5.3.1 Value function

We first plot in Figures 6 and 7 the evolution through time of the value function of the trader in thestate ψ1 = ψ2 = 1 and I1 = I2 = 0 during a slice of execution, obtained through finite differencemethod.

0 10000 20000 30000 40000 50000

−70000

−60000

−50000

−40000

−30000

−20000

−10000

0Value function at different time steps ψ1=ψ2=1 and I1= I2=0

t=0.4t=0.3t=0.2t=0.1t=0.0

Figure 6: Value function with respect to theinventory between t = 0 and t = 0.4.

0 10000 20000 30000 40000 50000

−40000

−30000

−20000

−10000


t=0.9t=0.8t=0.7t=0.6t=0.5

Figure 7: Evolution of the value function v be-tween t = 0.5 and t = 0.9.

The parabolic form of the value function comes from the term g(q− q⋆t ) in (2.3). The maximum valueindicates the optimal inventory for the next step in the slice. When t increases, the maximum shiftstoward zero, which means that the trader wants to finish the execution at the end of the slice.

We plot in Figures 8 and 9 the value function of (2.3) obtained using neural networks. We can seethat the neural networks approximate accurately the value function.

0 10000 20000 30000 40000 50000

−60000

−50000

−40000

−30000

−20000

−10000


t=0.4t=0.3t=0.2t=0.1t=0.0

Figure 8: Evolution of the value function v be-tween t = 0 and t = 0.4 using neural networks.

0 10000 20000 30000 40000 50000

−40000

−30000

−20000

−10000


t=0.9t=0.8t=0.7t=0.6t=0.5

Figure 9: Evolution of the value function v be-tween t = 0.5 and t = 0.9 using neural networks.

19

Next, we plot the strategy (in terms of limits and volumes) of the trader in both venues, using finitedifference schemes.

5.3.2 Strategy: limit orders and volumes with finite difference schemes

In Figures 10 and 11, we plot the limits at which the trader posts his limit orders in the two venues,given equal spread and imbalance processes.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00Limits at different time steps when ψ1=ψ2=1 and I1= I2=0 venue 1

t=0.9t=0.6t=0.3t=0.0

Figure 10: Limit strategy in the first venue,ψ1 = ψ2 = δ, I1 = I2 = 0.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75


t=0.9t=0.6t=0.3t=0.0

Figure 11: Limit strategy in the second venue,ψ1 = ψ2 = δ, I1 = I2 = 0.

As the trader has the same prior distribution in the two venues, his strategy is the same in both venues.At the beginning of the slice, i.e. at t = 0, the maximum of the value function is near q = 32000.Therefore, if the trader has a lower inventory, he does not post any orders and wait for the next timestep. If he has a higher inventory, he tries to reach q = 32000 inventory. For q ∈ [32000, 34000], beingsufficiently close to the next step optimal inventory, he posts limit orders on the second best limitto collect an additional tick. For q ∈ [34000, 40000], he posts at the first best limit to increase hisprobability of execution. If he has q > 40000, he creates a new best limit and accepts to loose onetick in order to be executed faster and reach the optimal inventory at the following time step. Wecan see in this behavior the trade-off between the possibility of being executed at a more favorableprice and the necessity to complete the execution.

For the sake of homogeneity (for all M market states, the trader faces similar trade-off), we consideredthe same set of controls for the limit where the trader can send his order. For this reason, we cansee that even for the spread equal to δ the trader can submit an order to the limit p = −1, which inpractice can obviously be treated as p = 0 due to piecewise monotonous nature of the optimal limitstrategy (which is, in fact, monotonous, though it cannot be reflected by finite differences when theoptimal volume equals to 0).

When the trader is near the end of the slice, he starts posting limit orders earlier (can be seen if bothvolumes and limits are considered). For example if t = 0.6, he begins to trade at the second best limitwhen q ∈ [8000, 11000], at the first best limit when q ∈ [11000, 19000], and creates a new best limitwhen q ∈ [19000, 50000]. Therefore, if the trader still has a very positive inventory at the end of theslice, he prefers to sacrifice one tick at the first best limit in order to complete his execution at this step.

20

It is important to highlight the fact that, when t = 0.9, the trader does not rush to liquidate hisinventory completely. This comes from the absence of a terminal penalty, often used in optimal liq-uidation problem to guarantee the complete execution of the inventory. It enables in some sense to“relax” the optimal execution framework on a slice, as the part of the inventory that has not beenexecuted during one slice is split between the remaining ones.

We plot in Figures 12 and 13 the volumes posted in both venues, for the same spread and imbalance.We see that, at the beginning of the slice, the trader begins to post a nonzero volume only whenq > 32000. Moreover, he posts a higher volume when he is near the end of the slice.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000

Volumes at different time steps when ψ1=ψ2=1 and I1= I2=0 venue 1t=0.9t=0.6t=0.3t=0.0

Figure 12: Volume sent to the first venue,ψ1 = ψ2 = δ, I1 = I2 = 0.

0 10000 20000 30000 40000 500000

2500

5000

7500

10000

12500

15000

17500


Figure 13: Volume sent to the second venue,ψ1 = ψ2 = δ, I1 = I2 = 0.

s When the second venue has a higher spread, we plot the strategy of the trader in both venues inFigures 14 and 15.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00Limits at different time steps when ψ1=1,ψ2=2 and I1= I2=0

t=0.5, venue 1t=0.1, venue 1t=0.5, venue 2t=0.1, venue 2

Figure 14: Limit strategy, ψ1 = δ, ψ2 = 2δ,I1 = I2 = 0.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000

Volumes at different time steps when ψ1=1,ψ2=2 and I1= I2=0t=0.5, venue 1t=0.1, venue 1t=0.5, venue 2t=0.1, venue 2

Figure 15: Volume strategy, ψ1 = δ, ψ2 = 2δ,I1 = I2 = 0.

For t = 0.5, we see in Figure 14 that the trader starts to post at the second best limit in the secondvenue when q = 10000 and in the first when q = 11000. For q ∈ [18000, 21000], he creates a new bestlimit in the second venue to execute his inventory faster but keeps posting at the best limit in thefirst venue in order to collect a higher spread. Finally, for q ∈ [30000, 50000], he stops posting in thesecond venue in order to consume more liquidity in the first one where the probability of getting hisorder filled is higher. Similar interpretations apply for t = 0.1.

21

In Figure 15, we see that the trader posts a higher volume in the first venue compared to the secondone. For t = 0.5, he starts to trade at q = 10000 for the second venue and at q = 11000 for the firstone. The volume posted in the first venue increases almost linearly with respect to the inventory. Incontrast, the volume posted in the second venue increases until an inventory of q = 22000, then staysconstant until q = 30000 and decreases to zero afterward. This means that for q ∈ [10000, 30000], thetrader prefers to collect the spread from both venues. When q > 30000, he prefers to stop posting inthe second venue, the one with a higher spread, in order to maximize his chances of being executedin the first one. Similar interpretations apply for t = 0.1.

In Figures 16 and 17, we show the choice of limits and volumes of the trader if the imbalance is morefavorable in the second venue. in Figure 16, we observe for t = 0.5 that the trader posts in the firstvenue at the second best limit for q ∈ [12000, 15000], at the first limit for q ∈ [15000, 20000] and ata new best limit for q ∈ [20000, 32000]. At the same time, he posts in the first limit of the secondvenue when q ∈ [12000, 22000] and at a new best limit for q ∈ [22000, 50000]. We see that the traderprefers to post at a higher limit in the second venue because of the higher probability of executiondue to a more favorable imbalance. For large inventories, he stops posting in the first venue in orderto increase his probability of execution using limit orders in the second venue at a new best limit.Same results hold for t = 0.1.

In Figure 17, we see that the trader posts a majority of his volume in the second venue due to a morefavorable imbalance. When his inventory is not too high, he collects the spread from both venues.However, when his inventory is relatively high, he sends all the volume to the first venue in order toincrease the probability of filling.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00Limits at different time steps when ψ1=ψ2=1 and I1= −0.5, I2=0.5


Figure 16: Limit order strategy, ψ1 = ψ2 = δ,I1 = −0.5, I2 = 0.5.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000

Volumes at different time steps when ψ1=ψ2=1 and I1=0.5, I2= −0.5t=0.5, venue 1t=0.1, venue 1t=0.5, venue 2t=0.1, venue 2

Figure 17: Volume strategy, ψ1 = ψ2 = δ,I1 = −0.5, I2 = 0.5.

We now describe the strategies on the limits and the volumes obtained by a reinforcement learningapproach.

5.3.3 Strategy: limit orders and volumes with neural networks

We plot in Figures 18 and 19 the strategies on the limits used by the trader. As soon as limits arerepresented by probabilities to send an order to each precise limit, for graphical representation, weplot the limit corresponding to the highest of the three probabilities. We see that the choice of thelimits is in line with the ones of Figures 10 and 11 up to states where optimal order volume is at 0 (in

22

this case limit values are indistinguishable for finite differences). When the trader is at the beginningof the slice, for a small inventory, he prefers to collect a higher spread by being executed at the secondbest limit. When he is near the end of the slice, he prefers to be filled at a less favorable price, atthe best or new best limit, in order to lower his execution risk. We can also see that neural networkspreserve the monotonicity of the optimal limit function.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75


t=0.9t=0.6t=0.3t=0.0

Figure 18: Limit order strategy in the firstvenue, ψ1 = ψ2 = δ, I1 = I2 = 0 using neuralnetworks.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75


t=0.9t=0.6t=0.3t=0.0

Figure 19: Limit order strategy in the secondvenue, ψ1 = ψ2 = δ, I1 = I2 = 0 using neuralnetworks.

In Figures 20 and 21, we plot the posted volumes of the trader in both venues for the same spreadand imbalance. We see that the strategy is a smoothed approximation of the one obtained using finitedifferences in Figures 12 and 13. We see that at the very beginning of the slice, the trader is not goingto trade if his inventory is already small enough. The strategy in both venues is the same up to somenegligible numerical effects.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000


Figure 20: Volume posted in the first venue,ψ1 = ψ2 = δ, I1 = I2 = 0 using neural net-works.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000

25000


Figure 21: Volume posted in the second venue,ψ1 = ψ2 = δ, I1 = I2 = 0 using neural net-works.

If the spread of the second venue is higher, we see in Figure 22 that the strategy with the limitsis the same as in Figure 14. It is interesting to note in Figure 23 that the trader does not stopposting in the second venue, as in Figure 15, again because of the approximation coming from neuralnetworks. However, this behavior enables to perform some exploration of the venue parameters. Forexample, if the trader follows the strategy given by finite differences in Figure 15, he posts a volumeequal to 0 in the second venue when q > 32000 for t = 0.5. However, if the trader underestimates

23

the prior on the filling probability in the second venue λ2, he will keep sending orders in the firstvenue, neglecting the possibility of splitting his orders which can potentially improve his execution.Moreover, Figures 8 and 9 show that this slight difference in the obtained controls does not changedrastically the performance of the trader in terms of the value function.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75



Figure 22: Limit order strategy, ψ1 = δ, ψ2 = 2δ,I1 = I2 = 0 using neural networks.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000


Figure 23: Volume strategy, ψ1 = δ, ψ2 = 2δ,I1 = I2 = 0 using neural networks.

The same comments apply to Figures 24 and 25, where we see that the trader posts a small butnonzero volume in the first venue with a less favorable imbalance which potentially allows to performexploration in this venue and faster improve parameter estimations.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75



Figure 24: Limit order strategy, ψ1 = ψ2 = δ,I1 = −0.5, I2 = 0.5 using neural networks.

0 10000 20000 30000 40000 500000

2500

5000

7500

10000

12500

15000

17500

20000


Figure 25: Volume strategy, ψ1 = ψ2 = δ,I1 = −0.5, I2 = 0.5 using neural networks.

5.4 Two different venues

In this section, we analyze the behavior of the trader believing that the first venue is better than thesecond venue in terms of filling rate. We compare the solutions obtained via finite difference schemesand neural networks.

5.4.1 Value function

We show in Figures 26 and 27 the evolution of the value function of the trader during a slice ofexecution, obtained through the finite difference method.

24

0 10000 20000 30000 40000 50000−70000

−60000

−50000

−40000

−30000

−20000

−10000


t=0.4t=0.3t=0.2t=0.1t=0.0

Figure 26: Evolution of the value function vbetween t = 0 and t = 0.4.

0 10000 20000 30000 40000 50000−50000

−40000

−30000

−20000

−10000


t=0.9t=0.8t=0.7t=0.6t=0.5

Figure 27: Evolution of the value function vbetween t = 0.5 and t = 0.9.

One can see that the value function deteriorates compared to the previous example, which is pre-dictable in view of the fact that one of the venues is exactly like in the above example, and anotherone is worse in terms of filling ratio. For example in Figure 27, the minimum of the function v att = 0.5 when q = 50000 is −49000 compared to a minimum of −45000 in the example above. Thisis a natural consequence of a worse prior distribution on the filling ratio of the second venue whilekeeping the prior on the first venue unchanged.

0 10000 20000 30000 40000 50000

−60000

−50000

−40000

−30000

−20000

−10000


t=0.4t=0.3t=0.2t=0.1t=0.0

Figure 28: Evolution of the value function vbetween t = 0 and t = 0.4 using neural net-works.

0 10000 20000 30000 40000 50000

−40000

−30000

−20000

−10000


t=0.9t=0.8t=0.7t=0.6t=0.5

Figure 29: Evolution of the value function vbetween t = 0.5 and t = 0.9 using neural net-works.

We check in Figures 28 and 29 that we obtain a similar shape for the value function using neuralnetworks.

We now describe the strategy of the trader on the limits and the posted volumes and compare it tothe case of two identical venues.

5.4.2 Strategy: limit orders and volumes with finite difference schemes

In Figures 30 and 31, we show the limit order strategy of the trader in the two venues for the samespreads and imbalances. As the second venue is less favorable for execution, the trader prefers tocreate a new best limit for smaller inventories. For example, when t = 0.6, he posts an order on thenew best limit starting from q = 19000, and in the second venue, he prefers to create a new limit

25

starting from q = 18000. Generally, either at the beginning or at the end of the slice, the traderprefers to post at a lower limit in the second venue in order to increase his execution rate there,sacrificing the spread that could have been collected.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75


t=0.9t=0.6t=0.3t=0.0

Figure 30: Limit order strategy in the firstvenue, ψ1 = ψ2 = δ, I1 = I2 = 0.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75


t=0.9t=0.6t=0.3t=0.0

Figure 31: Limit order strategy in the secondvenue, ψ1 = ψ2 = δ, I1 = I2 = 0.

The strategy of the trader differs drastically in terms of order volumes. In Figures 32 and 33, wesee that the trader posts the majority of his volume in the first venue. Especially when at t = 0.9the trader stops posting in the second venue to reduce his liquidity consumption and maximize hisprobability of execution in the first venue.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000

25000

30000

35000

40000Volumes at different time steps when ψ1=ψ2=1 and I1= I2=0 venue 1

t=0.9t=0.6t=0.3t=0.0

Figure 32: Volume posted in the first venue,ψ1 = ψ2 = δ, I1 = I2 = 0.

0 10000 20000 30000 40000 500000

2000

4000

6000

8000

10000


Figure 33: Volume posted in the second venue,ψ1 = ψ2 = δ, I1 = I2 = 0.

In Figures 34 and 35, we see the limits and the volumes recommended to the trader when the secondvenue has a higher spread, and the imbalances are equal. The trader posts an even smaller volume inthe second venue, compared to Figure 15. As the filling rate is lower in the second venue, the traderdecreases his liquidity consumption in this venue, because of the smaller probability of collecting ahigher spread.

The strategy on the limits in Figure 34 is also different from the one in Figure 14. When t = 0.5and the two venues are the same, the trader posts at the second best limit in the second venue whenq ∈ [11000, 13000], then at the first best limit when q ∈ [13000, 18000] and at a new best limit forq ∈ [18000, 30000]. When the venues are different, the trader posts at the second best limit in the

26

second venue for q ∈ [10000, 12000], at the first best limit for q ∈ [12000, 17000] and at a new bestlimit when q ∈ [17000, 19000]. Therefore, when the second venue has a worse filling rate, the traderposts in the second venue earlier (for a higher inventory) and less compared to the case with twoequivalent venues.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75



Figure 34: Limit order strategy, ψ1 = δ, ψ2 = 2δ,I1 = I2 = 0.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000


Figure 35: Volume strategy, ψ1 = δ, ψ2 = 2δ,I1 = I2 = 0.

If the imbalance is more favorable in the second venue, we see in Figures 36 and 37 that the strategy isvery different from the one in Figures 16 and 17 where the two venues shared the same characteristics.As the second venue has a more favorable imbalance, the trader posts a higher volume in it. However,he posts a nonzero volume in the first venue, because of the overall better filling ratio. This contrastswith Figure 17 where at some sufficiently high inventories, the trader stops sending orders to the firstvenue. Due to the trade-off between an overall higher filling ratio in the first venue and a more favor-able imbalance in the second venue, the trader splits his liquidity consumption between the two venues.

The strategy on the limits in Figure 36 also differs from the one with two identical venues in Fig-ure 16. For t = 0.5 in Figure 16, the trader posts in the first venue at the second best limitfor q ∈ [10000, 13000], at the first best limit for q ∈ [13000, 20000] and at a new best limit forq ∈ [20000, 32000]. In Figure 36, the trader posts in the first venue at the second best limit forq ∈ [10000, 12000], at the first best limit for q ∈ [12000, 18000] and at a new best limit for q > 18000.Therefore, he posts at a more favorable limit in terms of filling rate in the first venue in order tocompensate for the unfavorable imbalance compared to the second venue.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75



Figure 36: Limit order strategy, ψ1 = ψ2 = δ,I1 = −0.5, I2 = 0.5.

0 10000 20000 30000 40000 500000

2500

5000

7500

10000

12500

15000

17500


Figure 37: Volume strategy, ψ1 = ψ2 = δ,I1 = −0.5, I2 = 0.5.

27

Before moving to the analysis of the effectiveness of the Bayesian update of market parameters, weconclude with a comparison of the strategies obtained via neural networks optimization.

5.4.3 Strategy: limit orders and volumes with neural networks

We observe in Figures 38 and 39 that the strategy of the trader on the limits is in line with the onein Figures 30 and 31 up to the states where the optimal volume of the order equals 0.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75


t=0.9t=0.6t=0.3t=0.0

Figure 38: Limit order strategy in the firstvenue, ψ1 = ψ2 = δ, I1 = I2 = 0 using neuralnetworks.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75


t=0.9t=0.6t=0.3t=0.0

Figure 39: Limit order strategy in the secondvenue, ψ1 = ψ2 = δ, I1 = I2 = 0 using neuralnetworks.

In Figures 40 and 41, we see that the strategy of the trader on the posted volumes is well approximatedand smoothed by neural networks.

0 10000 20000 30000 40000 500000

10000

20000

30000

40000

50000Volumes at different time steps when ψ1=ψ2=1 and I1= I2=0 venue 1

t=0.9t=0.6t=0.3t=0.0

Figure 40: Volume posted in the first venue,ψ1 = ψ2 = δ, I1 = I2 = 0 using neural net-works.

0 10000 20000 30000 40000 500000

2000

4000

6000

8000

10000


Figure 41: Volume posted in the second venue,ψ1 = ψ2 = δ, I1 = I2 = 0 using neural net-works.

In Figures 42 and 43, we see in the case of a higher spread in the second venue that, because ofneural network parametrization of the strategy, the trader posts a nonzero volume in the secondvenue leaving the possibility to better explore the filling ratios. Results are in line with the ones inFigures 34 and 35: the trader posts the majority of his volume in the first venue because of a lowerspread and a more favorable filling ratio.

28

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75



Figure 42: Limit order strategy, ψ1 = δ, ψ2 = 2δ,I1 = I2 = 0 using neural networks.

0 10000 20000 30000 40000 500000

5000

10000

15000

20000


Figure 43: Volume strategy, ψ1 = δ, ψ2 = 2δ,I1 = I2 = 0 using neural networks.

Finally, we show in Figures 44 and 45 a similar behavior compared to the finite difference schemesin Figures 36 and 37: the trader posts a higher volume in the second venue due to a more favorableimbalance, and keeps posting in the first venue due to an overall more favorable filling ratio.

0 10000 20000 30000 40000 50000−1.00

−0.75

−0.50

−0.25

0.00

0.25

0.50

0.75



Figure 44: Limit order strategy, ψ1 = ψ2 = δ,I1 = −0.5, I2 = 0.5 using neural networks.

0 10000 20000 30000 40000 500000

2500

5000

7500

10000

12500

15000

17500


Figure 45: Volume strategy, ψ1 = ψ2 = δ,I1 = −0.5, I2 = 0.5 using neural networks.

5.5 Bayesian update

In this section, we analyze the effectiveness of the Bayesian update framework through several exe-cution slices.

5.5.1 Market simulation on a slice

We first show an example of a market simulation of one slice and demonstrate the trading strategythrough the slice, which are illustrated in Figure 46.

At t = 0.2, the spread in both venues is equal to δ, with an unfavorable imbalance in both venues. Inthat case, as two venues share the same characteristics, and the inventory is sufficiently close to theoptimal for the next step, so the trader sends the same quantity to both venues, which is close to zero.

29

When t = 0.5, the first venue has an unfavorable imbalance, and the second venue has a higher spread.In this configuration, the trader sends a higher volume in the first venue, in order to get a betterfilling rate due to a lower spread.

Finally, at t = 0.7, the first venue has a higher spread and a more favorable imbalance compared tothe second venue. This leads to a higher volume in the first venue at the second best limit and a lowervolume in the second venue at first best limit. The favorable imbalance in the first venue indicatesa higher probability of execution for an order at a higher limit, because the price may move in thisdirection. Therefore, even if the spread is equal to two ticks, the trader posts in this venue in order tobe executed at a more favorable price. As the spread in the second venue is lower, but the imbalanceis less favorable, he posts at the first best limit to benefit from the trade-off between execution andprofit through collecting the spread.

Figure 46: Market simulation: spreads (upper left), imbalances (upper right), volumes (lower left)and limits (lower right) in both venues.

The corresponding execution trajectory is shown in Figure 47, where we can see the typical Imple-mentation Shortfall execution shape, coming from the pre-computed trading curve q⋆.

0.0 0.2 0.4 0.6 0.8 1.0t

10000

20000

30000

40000

50000

Number of shares q

Figure 47: Evolution of the inventory of the trader on a slice of execution.

30

5.5.2 Update of the execution proportion

We show how the trader updates the market parameters through observations and trading. Theupdate of the execution proportion is quite fast, as it can be seen in Figures 48 and 49 the goodestimation can be achieved after completing 1-2 slices. In this example we started from the correctprior for the second venue and the inaccurate one for the first:

ρ1 =[

0.1 0.9]

, ρ2 =[

0.1 0.9]

.

0 250 500 750 1000 1250 1500 1750 2000

0.05

0.10

0.15

0.20

ρ1, 1

estimationreal value

0 250 500 750 1000 1250 1500 1750 2000

0.80

0.85

0.90

0.95

ρ1, 2


Bayesian updates of parameters ρ1

Figure 48: Bayesian update of the executed pro-portion in the first venue.

Figure 49: Bayesian update of the executed pro-portion in the second venue.

5.5.3 Update of the imbalance and the spread dynamics

We plot the convergence of the estimated transition matrices r1,ψ, r2,ψ in Figures 50 and 51. Weobserve quite fast convergence to a good approximation of the spread dynamics parameters. Theprior values are respectively:

r1,ψ =

[

−5 55 −5

]

, r2,ψ =

[

−5 55 −5

]

.

0 250 500 750 1000 1250 1500 1750 2000

−5.00

−4.75

−4.50

−4.25

−4.00

−3.75

−3.50

−3.25rψδ, δ


0 250 500 750 1000 1250 1500 1750 20003.25

3.50

3.75

4.00

4.25

4.50

4.75

5.00

rψδ, 2δestimationreal value

0 250 500 750 1000 1250 1500 1750 20003.25

3.50

3.75

4.00

4.25

4.50

4.75

5.00rψ2δ, δ


0 250 500 750 1000 1250 1500 1750 2000−5.00

−4.75

−4.50

−4.25

−4.00

−3.75

−3.50

−3.25rψ2δ, 2δ


Bayesian updates of parameters rψ for venue 1

Figure 50: Bayesian update of the transition ma-trix r1,ψ.

0 250 500 750 1000 1250 1500 1750 20003.25

3.50

3.75

4.00

4.25

4.50

4.75

5.00rψδ, δ


0 250 500 750 1000 1250 1500 1750 2000−5.00

−4.75

−4.50

−4.25

−4.00

−3.75

−3.50

−3.25rψδ, 2δ


0 250 500 750 1000 1250 1500 1750 2000−5.00

−4.75

−4.50

−4.25

−4.00

−3.75

−3.50

−3.25rψ2δ, δ


0 250 500 750 1000 1250 1500 1750 2000

3.25

3.50

3.75

4.00

4.25

4.50

4.75

5.00rψ2δ, 2δ


Bayesian updates of parameters rψ for venue 2

Figure 51: Bayesian update of the transition ma-trix r2,ψ.

We perform the same study for the transition matrices r1,I , r2,I of the imbalance processes throughthe observed one in Figures 52 and 53. We see that we need just a couple of slices to have a quite goodapproximation and only a dozen of slices (less for more granular slices) to achieve the right estimation.

31

Figure 52: Bayesian update of the transition matrix r1,I .

Figure 53: Bayesian update of the transition matrix r2,I .

32

In the examples in Figures 52 and 53 we started from the following prior parameters:

r1,I =

−5 2.8 2.22.2 −5. 2.82.2 2.8 −5

, r2,I =

−5 2.8 2.22.2 −5. 2.82.2 2.8 −5

.

5.5.4 Update of the long term drift of the asset

As we observe the increments of the price process St continuously, it is easy to converge toward a realmarket drift, the example is in Figure 54, we find µtrue = −0.5, starting from a prior of µ = 0.1. Ittook 20 slices to find a real value even if in the considered example we supposed to be sure in ourprior estimation ν = 0.02, which appeared to be incorrect.

0 20 40 60 80 100

−0.5

−0.4

−0.3

−0.2

−0.1

0.0

0.1 estimationreal value

Bayesian updates of parameter μ

Figure 54: Bayesian update of the drift of the asset.

5.5.5 Update of the intensity of limit orders

The hardest parameter to update quickly is obviously the intensity of filling which depends on statesof both venues. In our numerical setting we have 32 possible states, so during one slice of 10 timesteps we have no possibility to even visit all the states. The results of convergence of the parametersλ can be found in Figures 55 and 56. We see that full convergence requires a lot of observations,however we should keep in mind that to have a strategy close to the optimal one we do not necessitatean excessive precision.

In this example, we started from the priors same for both venues, whereas the real parameters aredifferent. The priors are:

λ1δ,δ = λ2

δ,δ =

5.35 6.52 7.112.75 3.4 3.791.5 1.86 2.1

, λ1δ,2δ = λ2

δ,2δ =

8.28 10.03 10.94.38 5.35 5.92.5 3.05 3.4

,

λ12δ,δ = λ2

2δ,δ =

1.81 2.27 2.50.78 1.04 1.190.29 0.43 0.53

, λ12δ,2δ = λ2

2δ,2δ =

2.96 3.65 4.1.42 1.81 2.040.68 0.9 1.04

.

33

0 5000 10000 15000 200005.2

5.4

5.6

5.8

λ1, (δ, δ), (−0.5, − 0.5)


0 5000 10000 15000 20000

6.50

6.75

7.00

7.25

λ1, (δ, δ), ( 0.5, 0.0)


0 5000 10000 15000 200007.0

7.2

7.4

7.6

λ1, (δ, δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

2.50

2.75

3.00

3.25

λ1, (δ, δ), (0.0, 0.5)


0 5000 10000 15000 20000

3.4

3.6

3.8

4.0

4.2λ1, (δ, δ), (0.0, 0.0)


0 5000 10000 15000 20000

3.8

4.0

4.2

λ1, (δ, δ), (0.0, 0.5)


0 5000 10000 15000 200001.4

1.6

1.8

2.0

λ1, (δ, δ), (0.5, 0.5)


0 5000 10000 15000 20000

1.8

2.0

2.2

λ1, (δ, δ), (0.5, 0.0)


0 5000 10000 15000 20000

2.2

2.4

2.6

λ1, (δ, δ), (0.5, 0.5)


0 5000 10000 15000 20000

8.0

8.2

8.4

8.6

8.8λ1, (δ, 2δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

9.75

10.00

10.25

10.50λ1, (δ, 2δ), ( 0.5, 0.0)


0 5000 10000 15000 20000

10.8

11.0

11.2

11.4

λ1, (δ, 2δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

4.4

4.6

4.8

5.0

λ1, (δ, 2δ), (0.0, 0.5)


0 5000 10000 15000 20000

5.50

5.75

6.00

6.25

λ1, (δ, 2δ), (0.0, 0.0)


0 5000 10000 15000 200005.8

6.0

6.2

6.4

6.6λ1, (δ, 2δ), (0.0, 0.5)


0 5000 10000 15000 200002.4

2.6

2.8

3.0

λ1, (δ, 2δ), (0.5, 0.5)


0 5000 10000 15000 20000

3.2

3.4

3.6

λ1, (δ, 2δ), (0.5, 0.0)


0 5000 10000 15000 200003.4

3.6

3.8

4.0

λ1, (δ, 2δ), (0.5, 0.5)


0 5000 10000 15000 20000

1.8

2.0

2.2

2.4λ1, (2δ, δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

2.2

2.4

2.6

2.8

λ1, (2δ, δ), ( 0.5, 0.0)


0 5000 10000 15000 200002.4

2.6

2.8

3.0

λ1, (2δ, δ), ( 0.5, 0.5)


0 5000 10000 15000 200000.8

1.0

1.2

λ1, (2δ, δ), (0.0, 0.5)


0 5000 10000 15000 200001.0

1.2

1.4

λ1, (2δ, δ), (0.0, 0.0)


0 5000 10000 15000 20000

1.2

1.4

1.6

1.8

λ1, (2δ, δ), (0.0, 0.5)


0 5000 10000 15000 20000

0.4

0.6

0.8

λ1, (2δ, δ), (0.5, 0.5)


0 5000 10000 15000 20000

0.6

0.8

1.0λ1, (2δ, δ), (0.5, 0.0)


0 5000 10000 15000 20000

0.6

0.8

1.0λ1, (2δ, δ), (0.5, 0.5)


0 5000 10000 15000 20000

3.0

3.2

3.4

3.6

λ1, (2δ, 2δ), ( 0.5, 0.5)


0 5000 10000 15000 200003.5

4.0

4.5

λ1, (2δ, 2δ), ( 0.5, 0.0)


0 5000 10000 15000 20000

4.00

4.25

4.50

4.75

5.00λ1, (2δ, 2δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

1.4

1.6

1.8

2.0λ1, (2δ, 2δ), (0.0, 0.5)


0 5000 10000 15000 200001.8

2.0

2.2

λ1, (2δ, 2δ), (0.0, 0.0)


0 5000 10000 15000 200002.0

2.2

2.4

λ1, (2δ, 2δ), (0.0, 0.5)


0 5000 10000 15000 20000

0.8

1.0

1.2

1.4λ1, (2δ, 2δ), (0.5, 0.5)


0 5000 10000 15000 20000

0.8

1.0

1.2

1.4λ1, (2δ, 2δ), (0.5, 0.0)


0 5000 10000 15000 20000

1.2

1.4

1.6λ1, (2δ, 2δ), (0.5, 0.5)


Bayesian updates of parameters λ1

Figure 55: Bayesian update of the intensity of limit orders in the first venue.

34

0 5000 10000 15000 200007.0

7.2

7.4

7.6

λ2, (δ, δ), (−0.5, 0.5)


0 5000 10000 15000 20000

2.50

2.75

3.00

3.25

λ2, (δ, δ), (0.0, 0.5)


0 5000 10000 15000 20000

3.4

3.6

3.8

4.0

4.2λ2, (δ, δ), (0.0, 0.0)


0 5000 10000 15000 20000

3.8

4.0

4.2

λ2, (δ, δ), (0.0, 0.5)


0 5000 10000 15000 200001.4

1.6

1.8

2.0

λ2, (δ, δ), (0.5, 0.5)


0 5000 10000 15000 20000

1.8

2.0

2.2

λ2, (δ, δ), (0.5, 0.0)


0 5000 10000 15000 20000

2.2

2.4

2.6

λ2, (δ, δ), (0.5, 0.5)


0 5000 10000 15000 20000

8.0

8.2

8.4

8.6

8.8λ2, (δ, 2δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

9.75

10.00

10.25

10.50λ2, (δ, 2δ), ( 0.5, 0.0)


0 5000 10000 15000 20000

10.8

11.0

11.2

11.4

λ2, (δ, 2δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

4.4

4.6

4.8

5.0

λ2, (δ, 2δ), (0.0, 0.5)


0 5000 10000 15000 20000

5.50

5.75

6.00

6.25

λ2, (δ, 2δ), (0.0, 0.0)


0 5000 10000 15000 200005.8

6.0

6.2

6.4

6.6λ2, (δ, 2δ), (0.0, 0.5)


0 5000 10000 15000 200002.4

2.6

2.8

3.0

λ2, (δ, 2δ), (0.5, 0.5)


0 5000 10000 15000 20000

3.2

3.4

3.6

λ2, (δ, 2δ), (0.5, 0.0)


0 5000 10000 15000 200003.4

3.6

3.8

4.0

λ2, (δ, 2δ), (0.5, 0.5)


0 5000 10000 15000 20000

1.8

2.0

2.2

2.4λ2, (2δ, δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

2.2

2.4

2.6

2.8

λ2, (2δ, δ), ( 0.5, 0.0)


0 5000 10000 15000 200002.4

2.6

2.8

3.0

λ2, (2δ, δ), ( 0.5, 0.5)


0 5000 10000 15000 200000.8

1.0

1.2

λ2, (2δ, δ), (0.0, 0.5)


0 5000 10000 15000 200001.0

1.2

1.4

λ2, (2δ, δ), (0.0, 0.0)


0 5000 10000 15000 20000

1.2

1.4

1.6

1.8

λ2, (2δ, δ), (0.0, 0.5)


0 5000 10000 15000 20000

0.4

0.6

0.8

λ2, (2δ, δ), (0.5, 0.5)


0 5000 10000 15000 20000

0.6

0.8

1.0λ2, (2δ, δ), (0.5, 0.0)


0 5000 10000 15000 20000

0.6

0.8

1.0λ2, (2δ, δ), (0.5, 0.5)


0 5000 10000 15000 20000

3.0

3.2

3.4

3.6

λ2, (2δ, 2δ), ( 0.5, 0.5)


0 5000 10000 15000 200003.5

4.0

4.5

λ2, (2δ, 2δ), ( 0.5, 0.0)


0 5000 10000 15000 20000

4.00

4.25

4.50

4.75

5.00λ2, (2δ, 2δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

1.4

1.6

1.8

2.0λ2, (2δ, 2δ), (0.0, 0.5)


0 5000 10000 15000 200001.8

2.0

2.2

λ2, (2δ, 2δ), (0.0, 0.0)


0 5000 10000 15000 200002.0

2.2

2.4

λ2, (2δ, 2δ), (0.0, 0.5)


0 5000 10000 15000 20000

0.8

1.0

1.2

1.4λ2, (2δ, 2δ), (0.5, 0.5)


0 5000 10000 15000 20000

0.8

1.0

1.2

1.4λ2, (2δ, 2δ), (0.5, 0.0)


0 5000 10000 15000 20000

1.2

1.4

1.6λ2, (2δ, 2δ), (0.5, 0.5)


0 5000 10000 15000 200005.25

5.50

5.75

6.00

6.25λ2, (δ, δ), ( 0.5, 0.5)


0 5000 10000 15000 20000

2.4

2.6

2.8

3.0

λ2, (δ, δ), ( 0.5, 0.0)


Bayesian updates of parameters λ2

Figure 56: Bayesian update of the intensity of limit orders in the second venue.

35

Appendix A Proof of Theorem 2.3

It can be show with the dynamic programming principle that the HJBQVI (2.3) does not depend onthe cash variable x. We set

(

q, ψ, I)

∈ D = Q × K and (ti, Si) ∈ [0, T ) × R such that

ti →i→+∞ t,

Si →i→+∞ S,

v(ti, q, Si, ψ, I) →i→+∞ v⋆(t, q, S, ψ, I).

We begin with t = T . By taking ℓn = 0 for all n ∈ {1, . . . , N} we get

v(ti, q, Si, ψ, I) ≥ Eti,q,Si,ψ,I

[

QTST −∫ T

0g(q⋆t − qt)dt

]

.

By dominated convergence, we get v⋆(T, q, S, ψ, I) ≥ qS.

Assume now that t < T and that the minimum in the HJBQVI is given by the first term. Wetake φ : [0, T ) × R × D → R be C1 in time, C2 in S and such that 0 = min[0,T ]×R×D(v⋆ − φ) =(v⋆ − φ)(t, q, S, ψ, I). If there exists η > 0 such that

2η <∂tφ(t, q, S, ψ, I)−g(q − q⋆t )+µ∂Sφ+12σ2∂SSφ+

∑


(

φ(t, q, S,kψ,kI)−φ(t, q, S, ψ, I))

+ supp∈Qψ,ℓ∈A

N∑

n=1


ǫnℓn(S +ψn

2+ pnδn) + φ

(

t, q − ℓnǫn, S, ψ, I)

− φ(t, q, S, ψ, I)]

,

we should have0 ≤∂tφ(t, q, S, ψ, I) − g(q − q⋆t ) + µ∂Sφ+

12∂SSφ+

∑


(

φ(t, q, S,kψ,kI)−φ(t, q, S, ψ, I))


N∑

n=1


ǫnℓn(S +ψn

2+ pnδn) + φ

(


− φ(t, q, S, ψ, I)]

,

for all (t, S) ∈ B =(

(t− r, t+ r) ∩ [0, T ))

×(

S− r, S+ r)

for a given r ∈ (0, T − t). We can assumewithout loss of generality that B contains the sequences (ti, Si)i and, by taking η arbitrarily small

φ(t, q, S, ψ, I) + η ≤ v⋆(t, q, S, ψ, I) ≤ v(t, q, S, ψ, I)

on the boundary of B, denoted by ∂pB. Without loss of generality we can also assume that

φ(t, q, S, ψ, I) + η ≤ v⋆(t, q, S, ψ, I) ≤ v(t, q, S, ψ, I),

for (t, q, S, ψ, I) ∈ B where

B ={

(t, q, S, ψ, I) : (t, S) ∈ B, q ∈ {q − minnǫn, q, q + min

nǫn},

ψ ∈N∏

n=1

{ψn − δn, ψn, ψn + δn}, I ∈N∏

n=1

{In−1, In, In+1}, (q, ψ, I) 6= (q, ψ, I)

}

.

36

We introduce the set

BD ={

(t, q, S, ψ, I) : (t, S) ∈ B}

,

and denote by τi the first exit time of (t, qt, St, ψt, It)t≥ti from BD, with qti = q, Sti = S, ψti = ψ, Iti = I,and the processes are controlled by the optimal controls (ℓn, pn)n∈{1,...,N} ∈ A ×Qψ. By Ito’s formula,we get

φ(τi, qτi ,Sτi , ψτi , Iτi) = φ(ti, qti , Sti, ψti , Iti) +∫ τi

ti

∂tφ(s, qs, Ss, ψs, Is) + µ∂Sφ

+12σ2∂SSφ+

∑

k∈Kr(ψs,Is),(kψ ,kI )

(

φ(s, qs, Ss,kψ,kI) − φ(s, qs, Ss, ψs, Is))

+N∑

n=1

λn(ψs, Is, pns , ℓs)E[

φ(

s, qs − ℓns ǫns , Ss, ψs, Is

)

− φ(s, qs, Ss, ψs, Is)]

ds+M(τi, ti),

where M(τi, ti) is a martingale. This can be rewritten as

φ(τi, qτi ,Sτi , ψτi , Iτi) = φ(ti, qti, Sti , ψti , Iti) +∫ τi

ti

∂tφ(s, qs, Ss, ψs, Is) + µ∂Sφ− g(qs − q⋆(s))

+12σ2∂SSφ+

∑

k∈Kr(ψs,Is),(kψ ,kI )

(

φ(s, qs, Ss,kψ,kI) − φ(s, qs, Ss, ψs, Is))

+N∑

n=1


ǫns ℓns (Ss+

ψns2

+ pns δn)+φ

(

s, qs−ℓns ǫns , Ss, ψs, Is

)

−φ(s, qs, Ss, ψs, Is)]

ds

+M(τi, ti) −N∑

n=1

∫ τi

ti


ǫns ℓns (Ss +

ψns2

+ pns δn)]

+ g(qs − q⋆(s))ds.

We derive

φ(τi, qτi, Sτi , ψτi , Iτi) ≥φ(ti, qti , Sti , ψti , Iti)

+M(τi, ti) −N∑

n=1

∫ τi

ti


ǫns ℓns (Ss +

ψns2

+ pns δ)]

+ g(qs − q⋆(s))ds.

As the martingale term vanishes with the expectation, we get

φ(ti, qti , Sti, ψti , Iti) ≤E

[

φ(τi, qτi , Sτi, ψτi , Iτi)

+N∑

n=1

∫ τi

ti


ǫns ℓns (Ss +

ψns2

+ pns δ)]

− g(qs − q⋆(s))ds]

.

and thus

φ(ti, qti , Sti, ψti , Iti) ≤ − η + E

[

v(τi, qτi , Sτi , ψτi , Iτi)

+N∑

n=1

∫ τi

ti


ǫns ℓns (Ss +

ψns2

+ pns δ)]

− g(qs − q⋆(s))ds]

.

37

For i sufficiently large, we deduce

v(ti, q, Sti , ψ, I) ≤ −η

2+ E

[

v(τi, qτi , Sτi , ψτi , Iτi)

+N∑

n=1

∫ τi

ti


ǫns ℓns (Ss +

ψns2

+ pns δ)]

− g(qs − q⋆(s))ds]

,

which contradicts the dynamic programming principle. In conclusion, we necessarily have

0 ≥∂tv(t, q, S, ψ, I)−g(q − q⋆t )+µ∂Sv+12σ2∂SSv+

∑


(

v(t, q, S,kψ,kI)−v(t, q, S, ψ, I))


N∑

n=1


ǫnℓn(S +ψn

2+ pnδn) + v

(


− v(t, q, S, ψ, I)]

.

The second part of the HJBQVI being straightforward, we prove that v is a viscosity supersolutionof the HJBQVI on [0, T ) × R × D. The proof for the subsolution is identical, except that we need toprove

N∑

n=1

supmn∈[0,m]

mn(S −ψn

2) + v

(

t, q −mn, S, ψ, I)

− v(t, q, S, ψ, I) ≥ 0,

which is direct by choosing the constant controls mn = 0 for all n ∈ {1, . . . , N}.

For the proof of the uniqueness, we recall the definition of subjet and superjet.

Definition Appendix A.1. Let v : [0, T ) × R × D → R be l.s.c (resp u.s.c) with respect to (t, S).For (t, q, S, ψ, I) ∈ [0, T ) × R × D we say that (y, p, A) ∈ R

3 is in the subjet P−v(t, q, S, ψ, I) (resp.the superjet P+v(t, q, S, ψ, I) if

v(t, q, S, ψ, I) ≥ v(t, q, S, ψ, I) + y(t− t) + p(S − S) +12A(S − S)2 + o(|t− t| + |S − S|2),

(resp. v(t, q, S, ψ, I) ≥ v(t, q, S, ψ, I) + y(t− t) + p(S − S) + 12A(S − S)2 + o(|t− t| + |S − S|2)), for

all (t, S) such that (t, q, S, ψ, I) ∈ [0, T ) × R × D. We also define P−

(t, q, S, ψ, I) as the set of points(y, p, A) ∈ R

3 such that there exists a sequence (tI , q, Si, ψ, I , yi, pi, Ai) satisfying

(ti, q, Si, ψ, I, yi, pi, Ai) →i→+∞ (t, q, S, ψ, I, y, p, A).

The set P+

(t, q, S, ψ, I) is defined similarly.

We now introduce an analogous of the Ishii’s lemma, whose proof can be found in [11].

Lemma Appendix A.2. A l.s.c (resp u.s.c) function v is a supersolution (resp. subsolution) of theHJBQVI on [0, T ) × R × D if and only if for all (t, q, S, ψ, I) ∈ [0, T ) × R × D, and all (y, p, A) ∈

P−

(t, q, S, ψ, I) (resp. P+

(t, q, S, ψ, I)), we have

0 ≤ min

− y + g(q − q⋆(t)) − µp−12σ2A−

∑


(

v(t, q, S,kψ,kI) − v(t, q, S, ψ, I))

38


N∑

n=1


ǫnℓn(S +ψn

2+ pnδn) + v(t, q − ℓnǫn, S, ψ, I)

− v(t, q, S, ψ, I)]

;N∑

n=1

v(t, q, S, ψ, I) − supmn∈[0,m]

mn(S −ψn

2) + v

(

t, q −mn, S, ψ, I)

,

(resp. ≤ 0).

We now prove the following comparison principle:

Proposition Appendix A.3. Let u (resp. v) be a l.s.c supersolution (resp. u.s.c subsolution) withpolynomial growth of the HJBQVI on [0, T ) × R × D. If u ≥ v on {T} × R × D, then u ≥ v on[0, T ) × R × D.

Proof. For ρ > 0 we introduce the following change of variables:

u(t, q, S, ψ, I) = eρtu(t, q, S, ψ, I), v(t, q, S, ψ, I) = eρtv(t, q, S, ψ, I).

Then, u and v are respectively supersolution and subsolution of the following equation:

0 = min

− ∂tw(t, q, S, ψ, I) + ρw(t, q, S, ψ, I) + g(q − q⋆t ) − µ∂Sw −12σ2∂SSw

−∑


(

w(t, q, S,kψ,kI) − w(t, q, S, ψ, I))


N∑

n=1


ǫnℓneρt(S +ψn

2+ pnδn) + w

(


− w(t, q, S, ψ, I)]

;N∑

n=1

w(t, q, S, ψ, I) − supmn∈[0,m]

mneρt(S −ψn

2) + w

(

t, q −mn, S, ψ, I)

,

on [0, T ) × R × D, with u ≥ v on {T} × R × D. In order to prove the proposition, we only have toshow that u ≥ v on [0, T ) × R × D. Assume that the minimum is given by the first term and thatsup[0,T )×R×D v − u > 0. We fix p ∈ N

⋆ such that

lim‖S‖2→+∞

supt∈[0,T ]

(q,ψ,I)∈D

|u(t, q, S, ψ, I)| + |v(t, q, S, ψ, I)|

1 + ‖S‖2p2

= 0.

Then, there exists (t, q, S, ψ, I) ∈ [0, T ] × R × D such that

0 < v(t, q, S, ψ, I) − u(t, q, S, ψ, I) − φ(t, q, S, S, ψ, I)

= max(t,q,S,ψ,I)

v(t, q, S, ψ, I) − u(t, q, S, ψ, I) − φ(t, q, S, S, ψ, I),

where ǫ > 0 is small enough and

φ(t, S, R) = ǫ exp(−κt)(

1 + ‖S‖2p2 + ‖R‖2p

2

)

, κ > 0.

39

Since u ≥ v on {T} × R × D, we directly have t < T .

For all i ∈ N, we can find a sequence (ti, Si, Ri) such that

0 < v(ti, q, Si, ψ, I) − u(ti, q, Ri, ψ, I) − φ(ti, Si, Ri) − i|Si − Ri|2 −

(

|ti − t|2 + |Si − S|4)

= max(t,S,R)

v(t, q, S, ψ, I) − u(t, q, R, ψ, I) − φ(t, S, R) − i|S −R|2 −(

|t− t|2 + |S − S|4)

.

Then we have:(ti, Si, Ri) −→

i→+∞(t, S, S)

up to a subsequence, and

v(ti, q, Si, ψ, I) − u(ti, q, Ri, ψ, I) − φ(ti, Si, Ri) − i|Si − Ri|2 −

(

|ti − t|2 + |Si − S|4)

−→n→+∞

v(t, q, S, ψ, I) − u(t, q, S, ψ, I) − φ(t, S, S)

Let us then denote for i ∈ N∗

ϕi(t, S, R) = φ(t, S, R) + i|S −R|2 + |t− t|2 + |S − S|4 ∀ (t, S, R) ∈ [0, T ] × R2.

Then Ishii’s Lemma (see [9, 14]) guarantees that for all η > 0,we can find (y1i , p

1i , A

1i )∈P+v(ti, q, Si, ψ, I)

and (y2i , p

2i , A

2i ) ∈ P−u(ti, q, Ri, ψ, I) such that:

y1i − y2

i = ∂tϕi(ti, Si, Ri), (p1i , p

2i ) = (∂Sϕi,−∂Rϕi) (ti, Si, Ri),

and(

A1i 0

0 −A2i

)

≤ HSRϕi(ti, Si, Ri) + η (HSRϕn(ti, Si, Ri))2 ,

where HSRϕi(ti, ., .) denotes the Hessian matrix of ϕi(ti, ., .). Applying Lemma Appendix A.2, we get

ρ(

v(ti, q, Si, ψ, I) − u(ti, q, Ri, ψ, I))

≤ y1i − y2

i +12σ2(A1

i − A2i ) + µ(p1

i − p2i )

+∑


(

v(ti, q, Si,kψ,kI) − v(ti, q, Si, ψ, I))


N∑

n=1

λn(ψ, I , pn, ℓ)E[

ǫnℓneρti(Si +ψn

2+ pnδn) + v

(

ti, q − ℓnǫn, Si, ψ, I)

− v(ti, q, Si, ψ, I)]

−∑


(

u(ti, q, Ri,kψ,kI) − u(ti, q, Ri, ψ, I)

)


N∑

n=1


ǫnℓneρti(Ri +ψn

2+ pnδn) + u

(

ti, q − ℓnǫn, Ri, ψ, I)

− u(ti, q, Ri, ψ, I)]

.

Moreover, we have

HSRϕi(ti, Si, Ri) =

(

∂2SSφ(ti, Si, Ri) + 2i+ 12(Si − S)2 ∂2

SRφ(ti, Si, Ri) − 2i∂2SRφ(ti, Si, Ri) − 2i ∂2

SRφ(ti, Si, Ri) + 2i

)

,

and

∂Sϕi(ti, Si, Ri) = ∂Sφ(ti, Si, Ri) + 2i|Si − Ri| + 4|Si − S|3,

40

∂Rϕi(ti, Si, Ri) = ∂Rφ(ti, Si, Ri) − 2i|Si −Ri|,

so from what precedes we can write

ρ(

v(ti, q, Si, ψ, I) − u(ti, q, Ri, ψ, I))

≤ ∂tφ(ti, Si, Ri) + 2(ti − t) + µ(

∂Sφ(ti, Si, Ri) + ∂Rφ(ti, Si, Ri)

+ 4(Si − S)3)

+12σ2(

∂2SSφ(ti, Si, Ri) + ∂2

RRφ(ti, Si, Ri) + 2∂2SRφ(ti, Si, Ri) + 12(Si − S)

)

+ ηCi

+∑


(

v(ti, q, Si,kψ,kI) − v(ti, q, Si, ψ, I))


N∑

n=1


ǫnℓneρti(Si +ψn

2+ pnδn) + v

(

ti, q − ℓnǫn, Si, ψ, I)

− v(ti, q, Si, ψ, I)]

−∑


(

u(ti, q, Ri,kψ,kI) − u(ti, q, Ri, ψ, I)

)


N∑

n=1


ǫnℓneρti(Ri +ψn

2+ pnδn) + u

(

ti, q − ℓnǫn, Ri, ψ, I)

− u(ti, q, Ri, ψ, I)]

,

where Ci does not depend on η. As v is u.s.c., u is l.s.c. and (ti, Si, Ri)i is convergent, when η → 0 itis clear, that, when i → +∞, for a certain constant M we get

ρ(

v(t, q, S, ψ, I)−u(t, q, S, ψ, I))

≤ ∂tφ(t, S, S) + µ(

∂Sφ(t, S, S) + ∂Rφ(t, S, S))

+12σ2(

∂2SSφ(t, S, S) + ∂2

RRφ(t, S, S) + 2∂2SRφ(t, S, S)

)

+M.

For κ > 0 large enough, the right-hand side is strictly negative, and as ρ > 0 we get

v(t, q, S, ψ, I) − u(t, q, S, ψ, I) < 0,

which yields to a contradiction. The proof for the other part of the HJBQVI is direct.

With the two above propositions, it is easy to conclude the proof of the theorem. Indeed, as v⋆is a supersolution such that v⋆ ≥ v on {T} × R × D, and v⋆ is a subsolution such that v⋆ ≤ v on{T}×R×D, we can apply the maximum principle to get v⋆ ≥ v⋆ on [0, T ]×R×D. But by definition ofv⋆ and v⋆, we must have v⋆ ≤ v ≤ v⋆ on [0, T ]×R×D, which proves that we have v⋆ = v = v⋆ and v iscontinuous. The maximum principle implies that if two continuous viscosity solutions of the HJBQVIsatisfy the same terminal condition, they are equal on [0, T ] × R × D, hence the uniqueness.

41

Appendix B Application to OTC market making

Appendix B.1 Framework

The model we present in this article is designed for trading in cross-listed stocks in limit order books.However, it can be adapted straightforwardly to handle the problem of an OTC market maker, whooften deals with a large number of assets driven by a few factors. We borrow here the factorial methodmarket making framework of [10] (we are also going to keep their notation only for this section). Weconsider a market maker who is in charge of providing bid and ask quotes on d assets, whose dynamicsare

dSit = µidt+ σidW it , i ∈ {1, . . . , d},

where µi is the drift of the i-th asset, σi is its volatility and (W 1t , . . . ,W

dt ) is a d-dimensional Brownian

motion. We consider a non-singular variance-covariance matrix Σ = (ρi,jσiσj)i,j∈{1,...,d} for the vectorof assets (S1

t , . . . , Sdt ). The market maker sets bid and ask prices on every asset:

Si,b(t, z) = Sit − δi,b(t, z), Si,a(t, z) = Sit + δi,a(t, z), z ∈ R,

where δ = (δi,a, δi,b)i∈{1,...,d} are the (predictable and uniformly lower bounded) bid and ask quotesaround the mid-price of each asset. The volume of transactions on the bid and ask sides are modeledby marked point processes N i,b(dt, dz), N i,a(dt, dz) of intensity νi,bt (dz), νi,at (dz) defined by

νi,jt (dt, dz) = Λi,j(

δi,j(t, z))

ηi,j(dz), i ∈ {1, . . . , d},

where Λi,j is a sufficiently regular function (exponential, logistic, SU Johnson etc.) modeling theprobability to trade on the asset i, on the side j for a given spread δi,j(t, z) and a size z. Thefunctions ηi,j(dz) are probability densities over R+ modeling the distribution of a trade size. Themarket maker manages his inventory process qt = (q1

t , . . . , qdt ) of dynamics given by

dqit =∫

R+

zN i,b(dt, dz) −∫

R+

zN i,a(dt, dz), i ∈ {1, . . . , d}.

The market maker manages his cash process given at time t by

dXt =d∑

i=1

∫

R+

zSi,a(t, z)N i,a(dt, dz) −∫

R+

zSi,b(t, z)N i,b(dt, dz).

Its optimization problem is defined as

supδ

E

[

XT +d∑

i=1

qiTSiT −

∫ T

0φ(qt)dt

]

,

where φ is a running penalty preventing from too large positions and∑di=1 q

iTS

iT is the marked-to-

market value of the market maker’s portfolio at time t. The corresponding HJB equation is givenby

0 =∂tv(t, q) +d∑

i=1

qiµi − φ(q) +d∑

i=1

∫

R+

zH i,b(v(t, q) − v(t, q + zei)

z

)

ηi,b(dz)

+d∑

i=1

∫

R+

zH i,a(v(t, q) − v(t, q − zei)

z

)

ηi,a(dz),

with terminal condition v(T, q) = 0, H i,j(p) = supδ Λi,j(δ)(δ − p), and (e1, . . . , ed) is the canonicalbasis of Rd.

42

Appendix B.2 Bayesian update for OTC market makers

Usually, the functions Λi,j are of the form

Λi,j(

δi,j(t, z))

= λi,jRFQf(

δi,j(t, z))

,

where λi,jRFQ is the constant intensity of arrival of requests for quote, and f(

δi,j(t, z))

gives the prob-ability that a request will result in a transaction given the quote δ proposed by the market maker.The estimation of the quantity λi,jRFQ is of particular importance for the market maker so that he canadjust his quotes depending on his view on the number of request for a certain asset and a certainside. In the same spirit as in Section 3.1.1, we assume the following prior distribution:

λi,jRFQ ∼ Γ(αi,j, βi,j), (αi,j, βi,j) > 0.

For an asset i ∈ {1, . . . , d} on the side j ∈ {a, b}, this corresponds to an average intensity of αi,j

βi,j,

with variance equal to αi,j

(βi,j)2 . If the market maker is confident in his estimation of the intensity

λi,jRFQ, he can choose a large βi,j so that the variance of his Bayesian estimator is small. Given all theinformation accumulated up to time t, its best estimation of the quantity λi,jRFQ, is given by

E

[

λi,jRFQ|N(t, dz)]

=αi,j +

∫

R+N(t, dz)

βi,j +∫

R+

∫ t0 f(δi,j(s, z))ds ηi,j(dz)

. (Appendix B.1)

By the law of large numbers, when the market maker has accumulated a sufficiently large numberof observations, his best estimation of λi,jRFQ converges to the “real” intensity of the market. As timepasses, the prior parameters (αi,j, βi,j) of the market maker are less important as the estimation willrely mostly on the observations.

Another important parameter of the model is the size of transactions, which impacts the quotes of themarket maker as well as his inventory risk. In [10], the authors choose in their numerical experimentsa Γ(ai,j, bi,j) distribution for ηi,j. The trader can choose between Bayesian updates (revise only ai,j ,only bi,j , or both), depending on his confidence on parameters’ estimation. If he is confident withrespect to the shape parameter ai,j, that is he knows approximately the average size of a request butnot the standard deviation, he sets bi,j ∼ Γ(ai,j0 , b

i,j0 ). Given n observations of size z1, . . . , zn, the best

Bayesian estimate of bi,j (the scale parameter of the size of the request) is

E[bi,j |(z1, . . . , zn)] =ai,j0 + nai,j

bi,j0 +∑ni=1 z

i. (Appendix B.2)

The use of different prior distribution to take into account the uncertainty on the shape parameterai,j (if bi,j is known) or on both (ai,j, bi,j) can be done in the same way.

Another sensitive parameter, especially for the multi-asset market making, is the variance-covariancematrix Σ. This quantity is usually estimated on a long run, but parameters are subject to a brutalchange. For example, let us assume that the market maker is in charge of d assets on 2 differentsectors (for instance, technology and aerospace). Following the factorial approach, the market makingproblem’s dimension will be reduced from d to 3. The three factors mainly correspond to the three

43

highest eigenvalues of the variance-covariance matrix Σ, and will drive the quotes of the marketmaker. However, in case of a sectorial tail event, for example the bankruptcy of one of the companiesof the tech sector, it is likely that all the correlations between the assets of this sector will rise toone. This will impact the eigenvalue related to the technology sector, and change the quotes of themarket maker as he has to avoid long inventory positions on assets whose values are decreasing. Todesign adaptive market making strategy based on Bayesian update of the correlation matrix and thedrift of the assets, we define the Normal-Inverse-Wishart prior on (µ,Σ) ∼ NIW(µ0, κ0, ν0, ψ), where(µ0, κ0, ν0, ψ) ∈ R

d × R⋆+ × (d− 1,+∞) × Md(R). This distribution is built as follows:

µ|(µ0, κ0,Σ) ∼ N(

µ0,1κ0

Σ)

, Σ|(ψ, ν0) ∼ W−1(ψ, ν0), then (µ,Σ) ∼ NIW(µ0, κ0, ν0, ψ),

where W−1 is the standard inverse Wishart distribution. In other words, the drift vector µ of theassets follows a multivariate Gaussian distribution whereas the variance-covariance matrix Σ followsa standard inverse Wishart distribution. At time t, if we note St = (S1

t , . . . , Sdt ) the prices observed

up to time t, the Bayesian update of (µ,Σ) is

(µ,Σ|St − S0) ∼ NIW(

κ0µ0 + (St − S0)κ0 + t

, κ0 + t, ν0 + t,

ψ + (St −Stt

)(St −Stt

)T +κ0t

κ0 + t(µ0 −

Stt

)(µ0 −Stt

)T

)

.

Following the law of large numbers, as t → +∞ we have a larger number of information and weconverge toward the drift and variance-covariance of the market maker’s portfolio. Therefore, themarket maker can recompute his factors derived from the updated variance-covariance matrix andadjust his quotes.

This extension deserves several remarks. First, the problems encountered by an OTC market makerare quite different from a high-mid frequency trader in an order book. The model is more parsimo-nious, especially for the intensity functions. Therefore, the convergence toward the “true” marketparameters will be faster than in order book model. The objective of the Bayesian update on thequantities λi,jRFQ is to determine the average behavior or the counterparts of the market maker. If heobserves a large number of requests on the ask (resp. bid) side of the asset i, the Bayesian update(Appendix B.1) enables the market maker to adjust his quotes to set a higher ask (resp. bid) pricefor this asset. If the market maker observes a higher discrepancy than expected for the transactionsizes, the Bayesian update (Appendix B.2) helps to adjust his quotes. Finally, the Bayesian learningon the drift and covariance of the assets enables to update the factors from which the market makerchooses his quotes.

44

References

[1] R. Almgren and B. Harts. A dynamic algorithm for smart order routing. White paper StreamBase,2008.

[2] R. Almgren, C. Thum, E. Hauptmann, and H. Li. Equity market impact. Risk, July, pages58–62, 2005.

[3] H. Alsayed and F. McGroarty. Arbitrage and the law of one price in the market for ameri-can depository receipts. Journal of International Financial Markets, Institutions and Money,22(5):1258–1276, 2012.

[4] M. Avellaneda, J. Reed, and S. Stoikov. Forecasting prices from level-i quotes in the presence ofhidden liquidity. Algorithmic Finance, 1(1):35–43, 2011.

[5] M. Avellaneda and S. Stoikov. High-frequency trading in a limit order book. QuantitativeFinance, 8(3):217–224, 2008.

[6] A. Bachouch, C. Huré, N. Langrené, and H. Pham. Deep neural networks algorithms for stochasticcontrol problems on finite horizon, part 2: Numerical applications, 2018.

[7] B. Baldacci, P. Bergault, and O. Guéant. Algorithmic market making: the case of equity deriva-tives. arXiv preprint arXiv:1907.12433, 2019.

[8] B. Baldacci, I. Manziuk, T. Mastrolia, and M. Rosenbaum. Market making and incentivesdesign in the presence of a dark pool: a deep reinforcement learning approach. arXiv preprintarXiv:1912.01129, 2019.

[9] G. Barles and C. Imbert. Second-order elliptic integro-differential equations: viscosity solutions’theory revisited. In Annales de l’Institut Henri Poincare/Analyse non lineaire, volume 3, pages567–585, 2008.

[10] P. Bergault and O. Guéant. Size matters for otc market makers: viscosity approach and dimen-sionality reduction technique. arXiv preprint arXiv:1907.01225, 2019.

[11] B. Bouchard. Introduction to stochastic control of mixed diffusion processes, viscosity solutionsand applications in finance and insurance. Lecture Notes Preprint, 2007.

[12] Á. Cartea, S. Jaimungal, and J. Penalva. Algorithmic and high-frequency trading. CambridgeUniversity Press, 2015.

[13] R. Cont and A. Kukanov. Optimal order placement in limit order markets. Quantitative Finance,17(1):21–39, 2017.

[14] M. G. Crandall, H. Ishii, and P.-L. Lions. User’s guide to viscosity solutions of second orderpartial differential equations. Bulletin of the American mathematical society, 27(1):1–67, 1992.

[15] J. Gatheral. No-dynamic-arbitrage and market impact. Quantitative finance, 10(7):749–759,2010.

45

[16] O. Guéant. The Financial Mathematics of Market Liquidity: From optimal execution to marketmaking, volume 33. CRC Press, 2016.

[17] O. Guéant, C.-A. Lehalle, and J. Fernandez-Tapia. Dealing with the inventory risk: a solutionto the market making problem. Mathematics and financial economics, 7(4):477–507, 2013.

[18] O. Guéant and I. Manziuk. Deep reinforcement learning for market making in corporate bonds:beating the curse of dimensionality. arXiv preprint arXiv:1910.13205, 2019.

[19] F. Guilbaud and H. Pham. Optimal high-frequency trading with limit and market orders. Quan-titative Finance, 13(1):79–94, 2013.

[20] C. Huré, H. Pham, A. Bachouch, and N. Langrené. Deep neural networks algorithms for stochasticcontrol problems on finite horizon, part i: convergence analysis. arXiv preprint arXiv:1812.04300,2018.

[21] A. Jain and C. Jain. Hidden liquidity on the us stock exchanges. The Journal of Trading,12(3):30–36, 2017.

[22] S. Laruelle, C.-A. Lehalle, and G. Pages. Optimal split of orders across liquidity pools: astochastic algorithm approach. SIAM Journal on Financial Mathematics, 2(1):1042–1076, 2011.

[23] S. Laruelle, C.-A. Lehalle, and G. Pagès. Optimal posting price of limit orders: learning bytrading. Mathematics and Financial Economics, 7(3):359–403, 2013.

[24] R. Rabinovitch, A. C. Silva, and R. Susmel. Returns on adrs and arbitrage in emerging markets.Emerging Markets Review, 4(3):225–247, 2003.

[25] S. Stoikov and M. Sağlam. Option market making under inventory risk. Review of DerivativesResearch, 12(1):55–79, 2009.

[26] I. M. Werner and A. W. Kleidon. Uk and us trading of british cross-listed stocks: An intradayanalysis of market integration. The Review of Financial Studies, 9(2):619–664, 1996.

46

Adaptive trading strategies across liquidity pools

Documents

Transcript of Adaptive trading strategies across liquidity pools