AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...
Transcript of AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...
![Page 1: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/1.jpg)
Separation result for delayedsharing information structures
Aditya Mahajan
McGill UniversityJoint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan
Queen's University, July 12, 2011
![Page 2: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/2.jpg)
Common theme: multi-stage multi-agentdecision making under uncertainty
![Page 3: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/3.jpg)
Outline of this talk
What is the conceptual difficulty withmulti-agent decision making? How to resolve it?
Modeling decision making under uncertainty
Overview of single-agent decision making
Delayed sharing information structure:A “simple” model for multi-agent decision making
History of the problemOur approachMain results
Conclusion
![Page 4: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/4.jpg)
Model of uncertainty Model of information
Model of objective
Modeling multi-stage decisionmaking under uncertainty
![Page 5: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/5.jpg)
Model of uncertainty
Stochastic dynamics
��� = ( � , � , � )Noisy observations
� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.
System dynamics and observationfunction ℎ are known.
Model of information
Model of objective
Modeling multi-stage decisionmaking under uncertainty
![Page 6: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/6.jpg)
Model of uncertainty
Stochastic dynamics
��� = ( � , � , � )Noisy observations
� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.
System dynamics and observationfunction ℎ are known.
Model of information
Single DM with perfect recall
� = � ( �:� , �:��� )Model of objective
Modeling multi-stage decisionmaking under uncertainty
![Page 7: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/7.jpg)
Model of uncertainty
Stochastic dynamics
��� = ( � , � , � )Noisy observations
� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.
System dynamics and observationfunction ℎ are known.
Model of information
Single DM with perfect recall
� = � ( �:� , �:��� )Model of objective
Cost at time � = � ( � , � ).Objective: minimize expected totalcost � [ �∑��� �� �, � ]
Modeling multi-stage decisionmaking under uncertainty
![Page 8: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/8.jpg)
More general setups
Model of uncertainty
Non-i.i.d. dynamics (Markov, ergodic, etc.)Unknown distributionUnknown model, unknown cost, etc.
Model of information
Fixed memory/complexity at decision makerMore than one decision maker
Model of objective
Worse-case performance (instead of expected performance)Minimize regret (instead of minimizing total cost)Remain in a desirable set (rather than minimize total cost)
![Page 9: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/9.jpg)
More general setups
Model of uncertainty
Non-i.i.d. dynamics (Markov, ergodic, etc.)Unknown distributionUnknown model, unknown cost, etc.
Model of information
Fixed memory/complexity at decision makerMore than one decision maker
Model of objective
Worse-case performance (instead of expected performance)Minimize regret (instead of minimizing total cost)Remain in a desirable set (rather than minimize total cost)
![Page 10: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/10.jpg)
Outline of this talk
What is the conceptual difficulty withmulti-agent decision making? How to resolve it?
Modeling decision making under uncertainty
Overview of single-agent decision making
Delayed sharing information structure:A “simple” model for multi-agent decision making
History of the problemOur approachMain results
Conclusion
![Page 11: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/11.jpg)
Design difficulties� = � �:�, �:���Domain of control laws increaseswith time
min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem
Single-agent decision making
![Page 12: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/12.jpg)
Design difficulties� = � �:�, �:���Domain of control laws increaseswith time
min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Single-agent decision making
![Page 13: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/13.jpg)
Design difficulties� = � �:�, �:���Domain of control laws increaseswith time
min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy
� �� = min�� �[� �, �+ ��� ���� | ��, �]
Single-agent decision making
![Page 14: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/14.jpg)
Estimation Control
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy
� �� = min�� �[� �, �+ ��� ���� | ��, �]
Single-agent decision making
![Page 15: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/15.jpg)
Estimation Control
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy
� �� = min�� �[� �, �+ ��� ���� | ��, �]
�� is policy independent
Single-agent decision making
![Page 16: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/16.jpg)
Estimation Control
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy
� �� = min�� �[� �, �+ ��� ���� | ��, �]
�� is policy independent Each step of DP is a parameteroptimization.
Single-agent decision making
![Page 17: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/17.jpg)
(One-way) separation betweenestimation and control
In single-agent decision making, estimation is separated from control.This separation is critical for decomposing the search of optmial
control policy into a sequence of parameter optimization problems.
Does this separation extendto multi-agent decision making?
![Page 18: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/18.jpg)
Outline of this talk
What is the conceptual difficulty withmulti-agent decision making? How to resolve it?
Modeling decision making under uncertainty
Overview of single-agent decision making
Delayed sharing information structure:A “simple” model for multi-agent decision making
History of the problemOur approachMain results
Conclusion
![Page 19: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/19.jpg)
Model of uncertainty
Model of information
Model of objective
Delayed-sharing information structure
![Page 20: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/20.jpg)
Model of uncertaintyStochastic dynamics
��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )
Model of information
Model of objective
Delayed-sharing information structure
![Page 21: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/21.jpg)
Model of uncertaintyStochastic dynamics
��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )
Model of information
Each DM has perfect recall
Each DM observes �-step delayedinformation (observation and actions)of other DMs
Model of objective
Delayed-sharing information structure
![Page 22: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/22.jpg)
Model of uncertaintyStochastic dynamics
��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )
Model of information
Each DM has perfect recall
Each DM observes �-step delayedinformation (observation and actions)of other DMs
�� = �� ( ��:�, ��:�����:���, ��:��� )
Model of objective
Delayed-sharing information structure
![Page 23: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/23.jpg)
Model of uncertaintyStochastic dynamics
��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )
Model of information
Each DM has perfect recall
Each DM observes �-step delayedinformation (observation and actions)of other DMs
�� = �� ( ��:�, ��:�����:���, ��:��� )
Model of objective
Cost at time � = � ( � , �:�� ).Objective: minimize
� [ �∑��� �� �, �:�� ]
Delayed-sharing information structure
![Page 24: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/24.jpg)
Delayed-sharing information structure
Some Notation
�� = �� ( ��:�, ��:�����:���, ��:��� ) �� = �� ( ��:���, ��:�����:�, ��:��� )Thus, �� = �� ( �� , ��� )where
Common info �� = �:��:���, �:��:���Local info ��� = ������:�, ������:���
![Page 25: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/25.jpg)
Delayed-sharing information structure
Some Notation
�� = �� ( ��:�, ��:�����:���, ��:��� ) �� = �� ( ��:���, ��:�����:�, ��:��� )Thus, �� = �� ( �� , ��� )where
Common info �� = �:��:���, �:��:���Local info ��� = ������:�, ������:���
Same design difficulties as single-agent case
![Page 26: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/26.jpg)
Literature overview
(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).
![Page 27: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/27.jpg)
Literature overview
(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).
(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .
![Page 28: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/28.jpg)
Literature overview
(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).
(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .
(Nayyar, Mahajan, and Teneketzis, 2011): Prove two alternativestructures of optimal control law.
![Page 29: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/29.jpg)
Literature overview
(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).
(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .
(Nayyar, Mahajan, and Teneketzis, 2011): Prove two alternativestructures of optimal control law.
NMT 2011 also obtain a recursive algorithm to find optimal control laws.At each step, we need to solve a functional optimization problem.
![Page 30: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/30.jpg)
Original setup�� = �� ��, ���Structure of optimal control law
![Page 31: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/31.jpg)
Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���
Structure of optimal control law
![Page 32: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/32.jpg)
Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���
NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���Structure of optimal control law
![Page 33: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/33.jpg)
Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���
NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���NMT11 Second result�� = �� ℙ ����� | �� , ��� ,��:������:���
Structure of optimal control law
![Page 34: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/34.jpg)
Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���
NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���NMT11 Second result�� = �� ℙ ����� | �� , ��� ,��:������:���
Contrast dependence on policy for the different results.
Structure of optimal control law
![Page 35: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/35.jpg)
Importance of the problem
Applications (of one step delay sharing)
Power systems: Altman et. al, 2009Queueing theory: Kuri and Kumar, 1995Communication networks: Grizzle et. al, 1982Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983Economics: Li and Wu, 1991
![Page 36: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/36.jpg)
Importance of the problem
Applications (of one step delay sharing)
Power systems: Altman et. al, 2009Queueing theory: Kuri and Kumar, 1995Communication networks: Grizzle et. al, 1982Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983Economics: Li and Wu, 1991
Conceptual significance
Understanding the design of networked control systems
Bridge between centralized and decentralized systems
Insights for the design of general decentralized systems
![Page 37: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/37.jpg)
Orthogonal search does
not work due to presence
of signaling. How does
controller 1 figure out how
agent 2 will interpret
his (agent 1's) actions?
![Page 38: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/38.jpg)
Orthogonal search does
not work due to presence
of signaling. How does
controller 1 figure out how
agent 2 will interpret
his (agent 1's) actions?
![Page 39: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/39.jpg)
Proof outline
1. Construct a coordinated system
2. Show that any policy of the coordinated system is implementable in theoriginal system and vice-versa. Hence, the two systems are equivalent.
3. Optimal design of the coordinated system is a single-agent multi-stagedecision problem. Find a solution for the coordinated system.
4. Translate this solution back to the original system.
![Page 40: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/40.jpg)
Step 1: The coordinated system��, ��� ��, ����� ���� ��
![Page 41: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/41.jpg)
Step 1: The coordinated system��, ��� ��, ����� ���� ��
Define partially evaluated control law: ��� = �� ��,
![Page 42: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/42.jpg)
Step 1: The coordinated system��� �����
�� ����� , ���
��� �����
Define partially evaluated control law: ��� = �� ��,Coordinator prescribes ��� , ��� to the controllers as��� , ��� = �� ��, ���:���, ���:���
![Page 43: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/43.jpg)
Step 2: Equivalence
For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.
![Page 44: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/44.jpg)
Step 2: Equivalence
For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )
![Page 45: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/45.jpg)
Step 2: Equivalence
For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )For any policy ��, . . . , �� of the coordinated system, we can constructa policy �, . . . , � of the original system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.
![Page 46: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/46.jpg)
Step 2: Equivalence
For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )For any policy ��, . . . , �� of the coordinated system, we can constructa policy �, . . . , � of the original system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.
At time 1, both controllers know ��. Choose�� ��, ��� = ��� �� ��� .At time 2, both controllers knows ��, ��� , and ��� . Choose�� ��, ��� = ��� ��, ��� , ��� ��� .
![Page 47: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/47.jpg)
Step 3: Solve the coordinated system
By construction, the coordinated system has a single decision makerwith perfect recall.
Use result for single-agent decision making:Define: �� = ℙ “Current state” | past history
Then, there is no loss of optimality in restricting attention to controllaws of the form:
control action = Fn ��
![Page 48: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/48.jpg)
Step 3: Solve the coordinated system
By construction, the coordinated system has a single decision makerwith perfect recall.
Use result for single-agent decision making:Define: �� = ℙ “Current state” | past history
Then, there is no loss of optimality in restricting attention to controllaws of the form:
control action = Fn ��What is the state (for I/O mapping) for the system.
![Page 49: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/49.jpg)
State for the coordinated system
��� �����
�� ����� , ���
��� �����
� �� , ���
![Page 50: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/50.jpg)
State for the coordinated system
��� �����
�� ����� , ���
��� �����
� �� , ���
State�, ��� , ���
![Page 51: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/51.jpg)
Structure of optimal control law
Define �� = ℙ �, ��� , ��� | ��, ���:���, ���:���Then, there is no loss of optimality in restricting attention tocoordination laws of the form��� , `��� = �� ��
![Page 52: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/52.jpg)
Structure of optimal control law
Define �� = ℙ �, ��� , ��� | ��, ���:���, ���:���Then, there is no loss of optimality in restricting attention tocoordination laws of the form��� , `��� = �� ��The following recursive equations provide an optimal coordination policy
� �� = min��� ,��� �[� �, � + ��� ���� | ��, ��� , ��� ]
![Page 53: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/53.jpg)
Step 4: Translate the solution
For a system with delayed-sharing information structure, there is no loss ofoptimality in restricting attention to control laws of the form�� = �� ��, ���Optimal control laws can be obtained by the solution of the followingrecursive equations
� �� = min�� ��,⋅ ,�� ��,⋅ �[� �, � + ��� ���� | ��, �� ��, , �� ��, ]
![Page 54: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/54.jpg)
Features of the solution
The space of realizations of �� = ℙ �, ��� , ��� | ��, ���:���, ���:���is time-invariant. Thus, the domain of the control laws �� ��, ��� istime-invariant.�� is not policy independent! Estimation is not separated from control.This is always the case when signaling is present.
In each step of the dynamic program, we choose the partially evaluatedcontrol laws �� ��, , �� ��, . Choosing partially evaluated functions(instead of values) allows us to write a dynamic program even in thepresence of signaling.
![Page 55: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/55.jpg)
Outline of this talk
What is the conceptual difficulty withmulti-agent decision making? How to resolve it?
Modeling decision making under uncertainty
Overview of single-agent decision making
Delayed sharing information structure:A “simple” model for multi-agent decision making
History of the problemOur approachMain results
Conclusion
![Page 56: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/56.jpg)
Summary
Simple methodology to resolve a 40 year old open question:
Find common information at each time
Look at the problem for the point of view of a coordinator thatobserves this common information and chooses partiallly evaluatedfunctions
Find an information state for the problem at the coordinatorℙ(state for input-output mapping | common information)( ℙ(past state | common information), past partial control laws )
This methodology is also applicable to systems with more generalinformation structures (Mahajan, Nayyar, Teneketzis, 2008).
![Page 57: AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983 Economics: Li and Wu, 1991 Conceptual significance Understanding](https://reader036.fdocuments.us/reader036/viewer/2022080720/5f79d8edb31d86001321890a/html5/thumbnails/57.jpg)
Salient Features
The size of the information state is time-invariant
The methodology is also applicable to infinite horizon problems
Each step of DP is a functional optimization problem
Form of the DP is similar to that of POMDP
Can borrow from the POMDP literature for numerical approaches