Mean Field Control: Selected Topics and Applications€¦ · Mean Field Game Theory Extensions and...

Motivation and backgroundMean Field Game Theory

Extensions and ApplicationFinal remarks and references

Mean Field Control: Selected Topics andApplications

Minyi Huang

School of Mathematics and StatisticsCarleton UniversityOttawa, Canada

University of Michigan, Ann Arbor, Feb 2015

Minyi Huang Mean Field Control: Selected Topics and Applications



Outline of talk

I Background and motivation

I Mean field game (MFG) model: N players (N is large)

I “N interacting particle system” modeling; complexity issuesI Mean field limit ideas

I Caines, Huang, and Malhame (03, 06, 07, ...); P.E. Caines,IEEE Control Systems Society Bode Lecture, 2009; Lasry andLions (06, 07, ...)

I An overview by Bensoussan et. al. (2013); Buckdahn et. al.(2011); a survey by Gomes and Saude (2014)

I Other modeling issues (major players, common noises, unknownmodel components, . . .); applications

I Remarks and references




Example 1: recharging control

I The aggregate recharging behavior of all Plug-in Electric Vehicles (PEVs)impacts the electricity price pt

I PEV i ’s optimization deals with (uit , pt). uit : its own recharging rate.

I For more technical details, see (Ma, Callaway, Hiskens, IEEE Trans. CST 2013).




Example 2: flu vaccination game

I The population vaccination coverage pt . The chance of an outbreakdecreases with pt .

I An individual plays with respect to pt (leading to a mean fieldmodel).

I Trade-off between infection risk and side effects, effort costs.




Example 3: relative performance

From (Espinosa and Touzi, 2013)

The performance of agent (manager) i (i = 1, 2, . . . ,N):

EU[(1− λ)X iT + λ(X i

T − X(−i)T )], 0 < λ < 1

I 1 non risky asset

I St = (S1t , . . . , S

dt ): d-dim risky asset (described as a diffusion)

I X i depends on S and portfolio strategy πi of agent i .

I Mean field coupling term X(−i)T = 1

N−1

∑j =i X

jT occurs in the utility

I This relative performance is related to human psychology insatisfaction




Example 3: production optimization

"Market"

agent k

agent iagent j

I In a stochastic growth model with many competing producers, letthe individual capital stock level be ui (t).

I The efficiency of production is impacted by u(N)(t) = 1N

∑Ni=1 ui (t).

(For instance, a congestion effect due to competitive use ofresources)

I Think of u(N) as a quantity measured by a macroscopic unit.




The mean field LQG gameMain resultsNonlinear models: some more detail

We are motivated to develop a general theory for mean field decisionproblems.

To formalize mathematically, we consider stochastic differential gameswith mean field interactions.

For instance, we may consider dynamics and costs:

dxi = (1/N)N∑j=1

fai (xi , ui , xj)dt + σdwi , 1 ≤ i ≤ N, t ≥ 0,

Ji (ui , u−i ) = E

∫ T

0

[(1/N)

N∑j=1

L(xi , ui , xj)]dt, T < ∞.





The traditional approach: infeasibility

I Rewrite vector mean field dynamics (controlled diffusion):

dx(t) = f (x(t), u1(t), . . . , uN(t))dt + σNdW (t).

I Cost of agent i ∈ {1, . . . ,N}: Ji (ui , u−i ) = E∫ T

0li (x(t), ui , u−i )dt

where u−i is set of controls of all other agents

I Dynamic programming (N coupled HJB equations):{0 = ∂vi

∂t +minui

[f T ∂vi

∂x + 12Tr(

∂2vi∂x2 σNσ

TN ) + li

],

vi (T , x) = 0, 1 ≤ i ≤ N

I Need too much information since the HJBs give an individualstrategy of the form ui (t, x1, . . . , xN).

I Computation is heavy, or impossible in nonlinear systems.I Need a new methodology: mean field stochastic control theory!





von Neumann and Morgenstern (1944, pp. 12)

Their vision on games with a large number of players –

“... When the number of participants becomes really great, some hopeemerges that the influence of every particular participant will becomenegligible, and that the above difficulties may recede and a moreconventional theory becomes possible.”

“... In all fairness to the traditional point of view this much ought to besaid: It is a well known phenomenon in many branches of the exact andphysical sciences that very great numbers are often easier to handle thanthose of medium size. An almost exact theory of a gas, containing about1025 freely moving particles, is incomparably easier than that of the solarsystem, made up of 9 major bodies ... This is, of course, due to theexcellent possibility of applying the laws of statistics and probability inthe first case.”





The basic framework of MFGsP0—Game with N players;Example

dxi = f (xi , ui , δ(N)x )dt + σ(· · · )dwi

Ji (ui , u−i ) = E∫ T0 l(xi , ui , δ

(N)x )dt

δ(N)x : empirical distribution of (xj )

Nj=1

solution−− →

HJBs coupled via densities pNi,t , 1 ≤ i ≤ N

+N Fokker -Planck-Kolmogorov equationsui adapted to σ(wi (s), s ≤ t)(i .e., restrict to decentralized infofor N players); so giving uNi (t, xi )

↓construct ↖performance? (subseq. convergence)↓N → ∞

P∞—Limiting problem, 1 playerdxi = f (xi , ui , µt)dt + σ(· · · )dwi

Ji (ui ) = E∫ T0 l(xi , ui , µt)dt

Freeze µt , as approx . of δ(N)x

solution−− →

ui (t, xi ) : optimal responseHJB (v(T , ·) given) :−vt = infui (f

T vxi + l + 12Tr [σσT vxi xi ])

Fokker -Planck-Kolmogorov :

pt = −div(fp) +∑

((σσT

2)jkp)x j

ixki

Coupled via µt (w . density pt ; p0 given)

I The consistency based approach (red) is more popular; related to ideas instatistical physics (McKean-Vlasov eqn); FPK can be replaced by an MV-SDE

I When a major player or common noise appears, new tools (stochastic mean fielddynamics, master equation, etc) are needed





The main ideas

I The procedure indicated by the red path is based on(i) freezing mean field, (ii) optimal response, (iii) consistency

I The procedure indicated by the blue path gives the limitingequation system (See the notes of Cardaliaguet, 2012)

I Without the decentralized information restriction, the problemis much harder since then the control would beui (t, x1, . . . , xN). This makes a very difficult problem inderiving the limiting equation system.





The mean field LQG game

I Individual dynamics:

dzi = (aizi + bui )dt + αz(N)dt + σidwi , 1 ≤ i ≤ N.

I Individual costs:

Ji = E

∫ ∞

0e−ρt [(zi − Φ(z(N)))2 + ru2i ]dt.

I zi : state of agent i ; ui : control; wi : noiseai : dynamic parameter; r > 0; N: population sizeFor simplicity: Take the same control gain b for all agents.

I z (N) = (1/N)∑N

i=1 zi , Φ: nonlinear functionI We use this simple scalar model (CDC’03, 04) to illustrate the

key idea; generalizations to vector states are obvious





The methodology of consistent mean field approximation

Mass influence

iz u

m(t)

i i Play against mass

Consistent mean field approximation –

I In the infinite population limit, individual strategies are optimalresponses to the mean field m(t);

I Closed-loop behaviour of all agents further replicates the same m(t)





The limiting optimal control problem

I Recall

dzi = (aizi + bui )dt + αz (N)dt + σidwi

Ji = E∫∞0

e−ρt [(zi − Φ(z (N)))2 + ru2i ]dt

I Take f , z∗ ∈ Cb[0,∞) (bounded continuous) and construct

dzi = ai zidt + buidt + αfdt + σidwi

Ji (ui , z∗) = E

∫∞0

e−ρt [(zi − z∗)2 + ru2i ]dt

Riccati Equation : ρΠi = 2aiΠi − (b2/r)Π2i + 1, Πi > 0.

I Optimal Control : ui = − br (Πizi + si )

ρsi =dsidt + ai si − b2

r Πi si + αΠi f − z∗.

I How to determine z∗?





The mean field solution system

Let Πa = Πi |ai=a. Assume (i) Ezi (0) = 0, i ≥ 1, (ii) The dynamicparameters {ai , i ≥ 1} ⊂ A have limit empirical distribution F (a).

Optimal control and consistent mean field approximations (Nashcertainty equivalence) =⇒

ρsa =dsadt

+ asa −b2

rΠasa + αΠaz − z∗,

dzadt

= (a− b2

rΠa)za −

b2

rsa + αz ,

z =

∫AzadF (a),

z∗ = Φ(z). replicatig step

In a system of N agents, agent i uses its own parameter ai to determine

ui = −b

r(Πai zi + sai ), 1 ≤ i ≤ N, decentralized!





Main results: existence, and ε-Nash equilibrium

Theorem (Existence and Uniqueness) Under mild assumptions, themean field solution system has a unique bounded solution (za, sa), a ∈ A.

Let si = sai be pre-computed from the NCE equation system and

u0i = −b

r(Πizi + si ), 1 ≤ i ≤ N.

Theorem (Nash equilibria, CDC’03, TAC’07) The set of strategies{u0i , 1 ≤ i ≤ N} results in an ε-Nash equilibrium w.r.t. costs Ji (ui , u−i ),1 ≤ i ≤ N, i.e. (diminishing value of centralized information),

Ji (u0i , u

0−i )− ε ≤ inf

uiJi (ui , u

0−i ) ≤ Ji (u

0i , u

0−i )

where 0 < ε → 0 as N → ∞, and ui depends on (t, z1, . . . , zN).





The nonlinear case

For the nonlinear diffusion model (CIS’06):

I HJB equation:

∂V

∂t= inf

u∈U

{f [x , u, µt ]

∂V

∂x+ L[x , u, µt ]

}+

σ2

2

∂2V

∂x2

V (T , x) = 0, (t, x) ∈ [0,T )× R.

⇓

Optimal Control : ut = φ(t, x |µ·), (t, x) ∈ [0,T ]× R.

I Closed-loop McK-V equation (which can be written asFokker-Planck equation):

dxt = f [xt , φ(t, x |µ·), µt ]dt + σdwt , 0 ≤ t ≤ T .

The NCE methodology amounts to finding a solution (xt , µt) in McK-Vsense. Extension by V. Kolokoltsov, W. Yang, J. Li. (Preprint’11)




Social optimizationMajor-minor playersRobustnessApplication to Capital accumulation game

E1: The model

I Individual dynamics (N agents):

dxi = A(θi )xidt + Buidt + DdWi , 1 ≤ i ≤ N.

I Individual costs:

Ji =E

∫ ∞

0

e−ρt{|xi − Φ(x (N))|2Q + uTi Rui

}dt,

where Φ(x (N)) = Γx (N) + η

I Specification

I θi : dynamic parameter, ui : control, Wi : noiseI x (N) = (1/N)

∑Ni=1 xi : mean field coupling term

I The social cost: J(N)soc =

∑Ni=1 Ji .

I The objective: minimize J(N)soc =⇒ Pareto optima.





The SCE equation system

I The Social Certainty Equivalence (SCE) equation system:

ρsθ =dsθdt

+ (ATθ − ΠθBR

−1BT )sθ

− [(ΓTQ + QΓ− ΓTQΓ)x + (I − ΓT )Qη],

dxθdt

= Aθ xθ − BR−1BT (Πθ xθ + sθ),

x =

∫xθdF (θ),

where xθ(0) = m0 and sθ is sought within Cρ/2([0,∞),Rn).





The social optimality theorem

Theorem Under some technical conditions, the set of SCE based controllaws

ui = −R−1BT (Πθi xi + sθi ), 1 ≤ i ≤ N

has asymptotic social optimality, i.e., for u = (u1, . . . , uN),

|(1/N)J(N)soc (u)− inf

u∈Uo

(1/N)J(N)soc (u)| = O(1/

√N + ϵN),

where limN→∞ ϵN = 0 and Uo is defined as a set of centralizedinformation based controls. �





Cost comparison (mean field game v.s. social optimum

0 0.5 1 1.50.8

0.85

0.9Social cost per agent

0 0.5 1 1.50

2

4

6NCE based cost

0 0.5 1 1.50

0.1

0.2

γ

Cost difference





E2: Dynamics with a major player

The LQG game with mean field coupling:

dx0(t) =[A0x0(t) + B0u0(t) + F0x

(N)(t)]dt + D0dW0(t), t ≥ 0,

dxi (t) =[A(θi )xi (t) + Bui (t) + Fx (N)(t) + Gx0(t)

]dt + DdWi (t),

x (N) = 1N

∑Ni=1 xi mean field term (average state of minor players).

I Major player A0 with state x0(t), minor player Ai with state xi (t).

I W0,Wi are independent standard Brownian motions, 1 ≤ i ≤ N.

We introduce the following assumption:(A1) θi takes its value from a finite set Θ = {1, . . . ,K} with anempirical distribution F (N), which converges when N → ∞.





Individual costs

The cost for A0:

J0(u0, ..., uN) = E

∫ ∞

0

e−ρt{∣∣x0 − Φ(x (N))

∣∣2Q0

+ uT0 R0u0}dt,

Φ(x (N)) = H0x(N) + η0: cost coupling term

The cost for Ai , 1 ≤ i ≤ N:

Ji (u0, ..., uN) = E

∫ ∞

0

e−ρt{∣∣xi −Ψ(x0, x

(N))∣∣2Q+ uTi Rui

}dt,

Ψ(x0, x(N)) = Hx0 + Hx (N) + η: cost coupling term.

I The presence of x0 in the dynamics and cost of Ai shows the stronginfluence of the major player A0.





A matter of “sufficient statistics”

One might conjecture asymptotic Nash equilibrium strategies of the form:

I x0(t) would be sufficient statistic for A0’s decision =⇒ u0(t, x0(t)) ;

I (x0(t), xi (t)) would be sufficient statistics for Ai ’s decision=⇒ ui (t, x0(t), xi (t)) .

Fact: The above conjecture fails!

Theorem (ε-Nash equilibrium) Under some technical conditions, a setof decentralized strategies of the form(u0[t, x0(t), z(t)], ui [t, x0(t), z(t), xi (t)]) is an ε-Nash equilibrium asN → ∞. (see Huang, SICON’10 for detail.)

For the case θi from a continuum, see (Nguyen and Huang, 12): randomGaussian approximation with a kernel function.





E3: Robustness: local/global unknown disturbance

I Tembine, Basar, et al. (2012): local disturbance as anadversarial player: embed a saddle point solution of the localplayers into the MFG.

I J. Huang and M. Huang (preprint, 2015): a commonunknown disturbance, worst case optimization;





E3: Robustness (ctn)

Consider N players, 1 ≤ i ≤ N:

dxi (t) = (Axi (t) + Bui (t) + Gx (N)(t) + f (t))dt + DdWi (t),

Ji (ui , u−i , f ) = E[ ∫ T

0

(|xi − (Γx (N) + η)|2Q + uTi Rui −

1

γ|f (t)|2

)dt

+ xTi (T )Hxi (T )],

where x (N) = 1N

∑Nj=1 xj ; f is an unknown L2(0,T ;Rn) signal.

The worse case cost

Jwoi (ui , u−i ) = supf ∈L2(0,T ;Rn)

Ji (ui , u−i , f ).





Robustness (ctn)

Main result:

I Under some conditions, we may construct (u1, . . . , uN), whereeach ui is determined from solving a limiting robust control(minimax control) problem. ui is determined from aforward-backward SDE system driving by Wi .

I The robust εN -Nash equilibrium for the N players, i.e.,

Jwoi (ui , u−i )− εN ≤ infui∈U

Jwoi (ui , u−i ) ≤ Jwoi (ui , u−i ).

where εN → 0 as n → 0. ui depends on (W1, . . . ,WN).





Mean field capital accumulation game: dynamics

I X it : output (or wealth) of agent i , 1 ≤ i ≤ N

I uit ∈ [0,X it ]: capital stock (so no borrowing)

I c it = X it − uit : amount for consumption

I u(N)t = (1/N)

∑Nj=1 u

jt : aggregate capital stock level

The next stage output, measured by the unit of capital, is

X it+1 = G (u

(N)t ,W i

t )uit , t ≥ 0, (3.1)

Regard u(N)t as being measured according to a macroscopic unit.

See Olson and Roy (2006) for a survey on stochastic growth theory.





The utility functional

The utility functional is Ji (ui , u−i ) = E

∑Tt=0 ρ

tv(X it − uit),

I ρ ∈ (0, 1]: the discount factor

I c it = X it − uit : consumption, u−i = (· · · , ui−1, ui+1, · · · )

I We take the HARA utilityv(z) = 1

γ zγ , z ≥ 0, γ ∈ (0, 1).

Main results: (i) The mean field game equation system has asolution (proved by fixed point theorem), (ii) The set ofdecentralized strategies obtained is an ε-Nash equilibrium.(for more detail, see Huang, DGAA’13)





Mean field dynamics with infinite horizon: nonlinear phenomena

pt+1 = Qmf(pt). The blue curve is Qmf

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

p0 0.5 1 1.5 2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

p0 0.2 0.4 0.6 0.8 1 1.2 1.4

0

0.2

0.4

0.6

0.8

1

1.2

1.4

p

(a) stable equilibrium (b) limit cycle (c) chaos

I Look for a stationary solution for the infinite horizon mean field capitalaccumulation game

I Check stability of the mean field induced from the stationary solution.





Continuous time modeling: Cobb-Douglas with HARA

The dynamics:

dXt = A(mt)Xαt dt − δXtdt − Ctdt − σXtdWt , (3.2)

The utility functional:

J =1

γE

[∫ T

0e−ρtC γ

t dt + e−ρTηλ(mT )XγT

]. (3.3)

I F (m, x) = A(m)xα is a mean field version of theCobb-Douglas production function with capital x and aconstant labor size.

I The function λ > 0 is continuous and decreasing on [0,∞).

I Take the standard choice γ = 1− α (equalizing the coefficientof the relative risk aversion to capital share)





Continuous time modeling: Cobb-Douglas with HARA

The solution equation system of the mean field game reduces to

p(t) =[ρ+ σ2γ(1−γ)

2 + δγ]p(t)− (1− γ)p

γγ−1 (t)

h(t) = ρh(t)− A(mt)γp(t),

dZt ={γA(mt)−

[γδ − γφ−1(t)− σ2γ(1−γ)

2

]Zt

}dt − γσZtdWt ,

mt = EZ1γt (= EXt),

where p(T ) = λ(mT )η and h(T ) = 0. φ(t) can be explicitlydetermined by λ(mT ) and other constant parameters.

I Existence = fixed point problem. Fix mt ; uniquely solve p, h;

further solve Zt(m(·)). Then mt = EZ1γt (m(·)).




Mean field game theory via “interacting particles” has evolved into amajor research area with many applications.

It adopts ideas from statistical physics.




Related literature: peer models (i.e., comparably small;only a partial list)

I J.M. Lasry and P.L. Lions (2006a,b, JJM’07): Mean field equilibrium; O.Gueant (JMPA’09); GLL’11 (Springer): Human capital optimization

I G.Y. Weintraub et. el. (NIPS’05, Econometrica’08): Oblivious equilibriafor Markov perfect industry dynamics; S. Adlakha, R. Johari, G.Weibtraub, A. Goldsmith (CDC’08): further generalizations with OEs

I M. Huang, P.E. Caines and R.P. Malhame (CDC’03, 04, CIS’06, TAC’07):Decentralized ε-Nash equilibrium in mean field dynamic games; M.Nourian, Caines, et. al. (TAC’12): collective motion and adaptation; A.Kizilkale and P. E. Caines (Preprint’12): adaptive mean field LQG games

I T. Li and J.-F. Zhang (IEEE TAC’08): Mean field LQG games with longrun average cost; M. Bardi (Net. Heter. Media’12) LQG

I H. Tembine et. al. (GameNets’09): Mean field MDP and team; H.Tembine, Q. Zhu, T. Basar (IFAC’11): Risk sensitive mean field games




Related literature (ctn)

I A. Bensoussan et. al. (2011, 2012, Preprints) Mean field LQG games(and nonlinear diffusion models).

I H. Yin, P.G. Mehta, S.P. Meyn, U.V. Shanbhag (IEEE TAC’12):Nonlinear oscillator games and phase transition; Yang et. al. (ACC’11);Pequito, Aguiar, Sinopoli, Gomes (NetGCOOP’11): application tofiltering/estimation

I D. Gomes, J. Mohr, Q. Souza (JMPA’10): Finite state space models

I V. Kolokoltsov, W. Yang, J. Li (preprint’11): Nonlinear markov processesand mean field games

I Xu and Hajek (2012): mean field supermarket games (cost results fromsampling and waiting)

I Z. Ma, D. Callaway, I. Hiskens (IEEE CST’13): recharging control oflarge populations of electric vehicles

I Y. Achdou and I. Capuzzo-Dolcetta (SIAM Numer.’11): Numericalsolutions to mean field game equations (coupled PDEs)




Related literature (ctn)

I R. Buckdahn, P. Cardaliaguet, M. Quincampoix (DGA’11): Survey

I R. Carmona and F. Delarue (Preprint’12): McKean-Vlasov dynamics forplayers, and probabilistic approach

I R. E. Lucas Jr and B. Moll (Preprint’11): Economic growth (a trade-offfor individuals to allocate time for producing and acquiring new knowldg)

I M. Balandat and C. J. Tomlin, Efficiency of MFG, ACC’13.

I Rome University Mean Field Game Workshop, May 2011

I A. Bensoussan et. al. (DGAA’13), time consistency strategies in LQGMFGs; B. Djehiche and M. Huang (Preprint’13), time consistency andSMP in nonlinear case.

I Padova University MFG Workshop, Aug. 2013

I Huang, Caines, Malhame (2006); Sen and Caines (2014); Kizilkale andCaines (2015): partial information models.




Related literature (ctn): major player models

I Huang (SICON’10): LQG models with minor players parameterized by afinite parameter set; develop state augmentation

I B.-C. Wang and J.-F. Zhang (Preprint’11): Markovian switching models

I S. Nguyen and Huang (SICON’12) random Gaussian mean fieldapproximation with continuum parameters;

I Bensoussan et. al. (2013)

I M. Nourian and P.E. Caines (SICON’13): Nonlinear diffusion models

I R. Buckdahn, J. Li and S. Peng (Preprint’13 )




Related literature (ctn):

Mean field type optimal control:

I D. Andersson and B. Djehiche (AMO’11): Stochastic maximum principle

I J. Yong (Preprint’11): control of mean field Volterra integral equations

I T. Meyer-Brandis, B. Oksendal and X. Y. Zhou (2012): SMP.

I R. Elliott, X. Li, and Y.-H. Ni (Auotmatica’13): discrete time LQG andRiccati equations.

There is only a single decision maker. It affects the mean of theunderlying state process.




I particular, there are applications of MFGs to economic growth andfinance.

I Gueant, Lasry and Lions (2011): human capital optimization

I Lucas and Moll (2011): Knowledge growth and allocation of time(JPE in press)

I Carmona and Lacker (2013): Investment of n brokers

I Huang (2013): capital accumulation with congestion effect

I Lachapelle et al. (2013): price formation

I Espinosa and Touzi (2013): Optimal investment with relativeperformance concern (depending on 1

N−1

∑j = Xj )

I ...




Thank you!


Mean Field Control: Selected Topics and Applications€¦ · Mean Field Game Theory Extensions and...

Documents

Transcript of Mean Field Control: Selected Topics and Applications€¦ · Mean Field Game Theory Extensions and...