Theory of Repeated Games
-
Upload
yosuke-yasuda -
Category
Economy & Finance
-
view
708 -
download
9
Transcript of Theory of Repeated Games
Theory of Repeated Games
Lecture Notes on Central Results
Yosuke YASUDA
Osaka University, Department of Economics
Last-Update: May 21, 2015
1 / 36
Announcement
Course Website: You can find my corse websites from the link below:https://sites.google.com/site/yosukeyasuda2/home/lecture/repeated15
Textbook & Survey: MS is a comprehensive textbook on repeatedgames, K and P are highly readable survey articles, which complement MS.
MS Mailath and Samuelson, Repeated Games and Reputations:Long-run Relationships. 2006.
K Kandori, 2008.
P Pearce, 1992.
Symbols that we use in lectures:�� ��Ex : Example,�� ��Fg : Figure,
�� ��Q : Question,�� ��Rm : Remark.
2 / 36
Finitely Repeated Games (1)
A repeated game, a specific class of dynamic game, is a suitableframework for studying the interaction between immediate gains andlong-term incentives, and for understanding how a reputation mechanismcan support cooperation.
Let G = {A1, ..., An;u1, ..., un} denote a static game in which players 1through n simultaneously choose actions a1 through an from the actionspaces A1 through An, and the corresponding payoffs are u1(a1, ..., an)through un(a1, ..., an).
Definition 1
The game G is called the stage game of the repeated game.Given a stage game G, let G(T ) denote the finitely repeated game inwhich G is played T times, with the outcomes of all preceding playsobserved before the next play begins.
Assume that the payoff for G(T ) is simply the sum of the payoffsfrom the T stage games. (future payoffs are not discounted)
3 / 36
Finitely Repeated Games (2)
Theorem 2
If the stage game G has a unique Nash equilibrium, then, for any finiteT , the repeated game G(T ) has a unique subgame perfect Nashequilibrium: the Nash equilibrium of G is played in every stageirrespective of the past history of the play.
Proof.
We can solve the game by backward induction, that is, starting fromthe smallest subgame and going backward through the game.
In stage T , players choose a unique Nash equilibrium of G.
Given that, in stage T − 1, players again end up choosing the sameNash equilibrium outcome, since no matter what they play in T − 1the last stage game outcome will be unchanged.
This argument carries over backwards through stage 1, whichconcludes that the unique Nash equilibrium outcome is played inevery stage (irrespective of the past history).
4 / 36
Finitely Repeated Games (3)
When there are more than one Nash equilibrium in a stage game,multiple subgame perfect Nash equilibria may exist.
Furthermore, an action profile which does not constitute a stagegame Nash equilibrium may be sustained (for any period t < T ) in asubgame perfect Nash equilibrium.�� ��Q The following stage game will be played twice. Can players support
non-equilibrium outcome (M1,M2) in the first period?
1 � 2 L2 M2 R2
L1 1, 1 5, 0 0, 0M1 0, 5 4, 4 0, 0R1 0, 0 0, 0 3, 3
�� ��Rm Note that there are two Nash equilibria in the stage game:(L1, L2), (R1, R2): what players choose in the first period may result indifferent outcomes (equilibria) in the second period.
5 / 36
Infinitely Repeated Games (1)
Even if the stage game has a unique Nash equilibrium, there may besubgame perfect outcomes of the infinitely repeated game in which nostage game’s outcome is a Nash equilibrium of G.
Let G(∞, δ) denote the infinitely repeated game in which G isrepeated forever and the players share the discount factor δ.
For each t, the outcomes of the t− 1 preceding plays of the stagegame are observed before the t-th stage begins.
Each player’s payoff in G(∞, δ) is the average payoff defined asfollows.
Definition 3
Given the discount factor δ, the average payoff of the infinite sequenceof payoffs u1, u2, ... is
(1− δ)(u1 + δu2 + δ2u3 + · · · ) = (1− δ)∞∑
t=1
δt−1ut.
6 / 36
Infinitely Repeated Games (2)
There are a few important remarks:
The history of play through stage t is the record of the players’choices in stages 1 through t.
The players might have chosen (as1, ..., a
sn) in stage s, where for each
player i the action asi belongs to Ai.
In the finitely repeated game G(T ) or the infinitely repeated gameG(∞, δ), a player’s strategy specifies the action that she will take ineach stage, for every possible history of play.
In the infinitely repeated game G(∞, δ), each subgame beginning atany stage is identical to the original game.
In G(T ), a subgame beginning at stage t + 1 is the repeated game inwhich G is played T − t times, denoted by G(T − t).
In a repeated game, a Nash equilibrium is subgame perfect if theplayers’ strategies constitute a Nash equilibrium in every subgame,i.e., after every possible history of the play.
7 / 36
Unimprovability (1)
Definition 4
A strategy σi is called a perfect best response to the other players’strategies, when player i has no incentive to deviate following any history.
Consider the following requirement that, at first glance, looks muchweaker than the perfect best response condition.
Definition 5
A strategy for i is unimprovable against a vector of strategies of heropponents if there is no t− 1 period history (for any t) such that i couldprofit by deviating from her strategy in period t only and conformingthereafter (i.e., switching back to the original strategy).
To verify the unimprovability of a strategy, one needs to checks only“one-shot” deviations from the strategy, rather than arbitrarilycomplex deviations.
8 / 36
Unimprovability (2)
The following result simplifies the analysis of SPNE immensely.
It is the exact counterpart of a well-known result from dynamicprogramming due to Howard (1960), and was first emphasized inthe context of self-enforcing cooperation by Abreu (1988).
Theorem 6
Let the payoffs of G be bounded. In the repeated game G(T ) orG(∞, δ), strategy σi is a perfect best response to a profile of strategies σif and only if σi is unimprovable against that profile.
The proof is simple, and generalizes easily to a wide variety of dynamicand stochastic games with discounting and bounded payoffs.
9 / 36
Unimprovability (3)
Proof of ⇒ (Note ⇐ is trivial).
We will only show “⇒” since “⇐” is trivial. Consider the contrapositive,i.e., not perfect best response ⇒ not umimprovable.
1 If σi is not a perfect best response, there must be a history afterwhich it is profitable to deviate to some other strategy.
2 Then, because of discounting and boundedness of payoffs, theremust exist a profitable deviation involves defection for finitely manyperiods (and conforms to σi thereafter).
If the deviation involves defection at infinitely many nodes, then forsufficiently large T , the strategy σ′′i that agrees with σ′i until time Tand conforms to σ thereafter, is also a profitable deviation (becauseof discounting and boundedness of payoffs).
3 Consider a profitable deviation involving defection at the smallestpossible number of period, denoted by T .
4 In such a profitable deviation, the player must be improvable (notunimprobable) after deviating for T − 1 period.
10 / 36
Repeated Prisoner’s Dilemma (1)
�� ��Q The following prisoner’s dilemma will be played infinitely many times.
Under what conditions of δ, can a SPNE support cooperation (C1, C2)?
1 � 2 C2 D2
C1 2, 2 -1, 3D2 3, -1 0, 0
Suppose that player i plays Ci in the first stage. In the t-th stage, if theoutcome of all t− 1 preceding stages has been all (C1, C2) then play Ci;otherwise, play Di (thereafter).
This strategy is called trigger strategy, because player i cooperatesuntil someone fails to cooperate, which triggers a switch tononcooperation forever after.
If both players adopt this trigger strategy then the outcome of theinfinitely repeated game will be (C1, C2) in every stage.
11 / 36
Repeated Prisoner’s Dilemma (2)
To show that the trigger strategy is SPNE, we must verify that thetrigger strategies constitute a Nash equilibrium on every possiblesubgame that could be generated in the infinitely repeated game.�� ��Rm Since every subgame of an infinitely repeated game is identical tothe game as a whole (thanks to its recursive structure), we have toconsider only two types of subgames: (i) subgame in which all theoutcomes of earlier stages have been (C1, C2), and (ii) subgames inwhich the outcome of at least one earlier stage differs from (C1, C2).
By unimprovability, it is sufficient to show that there is no one-shotprofitable deviation in every possible history that can realize whenplayers follow the trigger strategies.
Players have no incentive to deviate in (ii) since trigger strategyinvolves repeated play of one shot NE, (D1, D2).
12 / 36
Repeated Prisoner’s Dilemma (3)
The following condition guarantees that there will be no (one-shot)profitable deviation in (i).
2 + δ × 2 + δ2 × 2 + · · · ≥ 3 + δ × 0 + δ2 × 0 + · · ·⇐⇒ 2(δ + δ2 + · · · ) ≥ 1
⇐⇒ 2δ
1− δ≥ 1 ⇐⇒ δ ≥ 1
3.
Mutual cooperation (C1, C2) can be sustained as an SPNE outcomeby using the trigger strategy when players are long-sighted.
Trigger strategy (in repeated prisoner’s dilemma) is the severestpunishment, since each player receives her minmax payoff (in everyperiod) after deviation happens.
13 / 36
Folk Theorem: Preparation (1)
�� ��Rm The following expositions are Fudenberg and Maskin (1986).
For each j, choose M j = (M j1 , . . . ,M j
n) so that
(M j1 , . . . ,M j
j−1,Mjj+1, . . . ,M
jn) ∈ arg min
a−j
maxaj
uj(aj , a−j),
and player j’s reservation value is defined by
v∗j := maxaj
ui(aj ,Mj−j) = ui(M j).
The strategies M j = (M j1 , . . . ,M j
j−1,Mjj+1, . . . ,M
jn) are minimax
strategies (which may not be unique) against player j, and v∗j is thesmallest payoff that the other players can keep player j below.
We refer to (v∗1 , . . . , v∗n) as the minimax point.
14 / 36
Folk Theorem: Preparation (2)
Definition 7
Let V be the set of feasible payoffs, i.e., a convex hull of payoff vectorsu yielded by (pure) action profiles, and V ∗(⊂ V ) be the set of feasiblepayoffs that Pareto dominate the minimax point:
V ∗ = {(v1, . . . , vn) ∈ V |vi > 0 for all i}.
V ∗ is called the set of individually rational payoffs.
There are a couple of versions of folk theorem.
The name comes from the fact that the statement (relying on NErather than SPNE) was widely known among game theorists in the1950s, even though no one had published it.
15 / 36
Folk Theorem (1)
Theorem 8 (Theorem A)
For any (v1, . . . , vn) ∈ V ∗, if players discount the future sufficiently little,there exists a Nash equilibrium of the infinitely repeated game where,for all i, player i’s average payoff is vi.
If a player deviates, it may not be in others’ interest to go through withthe punishment of minimaxing him forever. However, Aumann andShapley (1976) and Rubinstein (1979) showed that, when there isno discounting, the counterpart of Theorem A holds for SPNE.
Theorem 9 (Theorem B)
For any (v1, . . . , vn) ∈ V ∗ there exists a subgame perfect equilibriumin the infinitely repeated game with no discounting, where, for all i,player i’s expected payoff each period is vi.
16 / 36
Folk Theorem (2)
One well-known case that admits both discounting and simple strategiesis where the point to be sustained Pareto dominates the payoffs of aNash equilibrium of the constituent game G.
Theorem 10 (Theorem C)
Suppose (v1, . . . , vn) ∈ V ∗ Pareto dominates the payoffs (y1, . . . , yn) ofa (one-shot) Nash equilibrium (e1, . . . , en) of G. If players discount thefuture sufficiently little, there exists a subgame perfect equilibrium ofthe infinitely repeated game where, for all i, player i’s average payoff is vi.
Because the punishments used in Theorem C are less severe thanthose in Theorems A and B, its conclusion is weaker.
For example, Theorem C does not allow us to conclude that aStackelberg outcome can be supported as an equilibrium in aninfinitely repeated quantity-setting duopoly.
17 / 36
General Falk Theorem — Two Players
Abreu (1988) shows that there is no loss in restricting attention tosimple punishments when players discount the future. Indeed, simplepunishments are employed in the proof of the following result.
Theorem 11 (Theorem 1)
For any (v1, v2) ∈ V ∗ there exists δ ∈ (0, 1) such that, for all δ ∈ (δ, 1),there exists a subgame perfect equilibrium of the infinitely repeatedgame in which player i’s average payoff is vi when players have discountfactor δ.
After a deviation by either player, the players (mutually) minimaxeach other for a certain number of periods, after which they returnto the original path.
If a further deviation occurs during the punishment phase, the phaseis begun again.
18 / 36
General Falk Theorem — Three or More Players
The method we used to establish Theorem 1 –“mutual minimaxing”–does not extend to three or more players.
Theorem 12 (Theorem 2)
Assume that the dimensionality of V ∗ equals n, the number of players,i.e., that the interior of V (relative to n-dimensional space) is nonempty.Then, for any (v1, . . . , vn) in V ∗, there exists δ ∈ (0, 1) such that for allδ ∈ (δ, 1) there exists a subgame perfect equilibrium of the infinitelyrepeated game with discount factor δ in which player i’s average payoff isvi.
If a player deviates, he is minimaxed by the other players longenough to wipe out any gain from his deviation.
To induce the other players to go through with minimaxing him,they are ultimately given a “reward” in the form of an additional εin their average payoff.
The possibility of providing such a reward relies on the fulldimensionality of the payoff set.
19 / 36
Imperfect Monitoring (1)
Perfect Monitoring: Players can fully observe the history of their pastplay. There is no monitoring difficulty or imperfection.
Bounded/Imperfect Recall: Players forget (part of) the history oftheir past play, especially that of distant past, as time goes by.
Imperfect Monitoring: Players cannot directly observe the (full) historyof their past play, but instead observe signals that depend on actionstaken in the previous period.�� ��Public Monitoring Players publicly observe a common signal.�� ��Private Monitoring Players privately receives different signals.
20 / 36
Imperfect Monitoring (2)
Punishment necessarily becomes indirectly linked with deviation.
Players can punish the deviator only in reaction to the commonsignals, since they cannot observe deviation itself.
Even if no one has deviated, punishment is triggered when badsignal realizes (with positive probability).
⇒ Constructing (efficient) punishment becomes dramatically difficult.
21 / 36
Example | Prisoner’s Dilemma (1)
Consider the following Prisoner’s Dilemma as a stage game while eachplayer cannot observe the rival’s past actions.
Table: Ex ante Payoffs ui(ai, a−i)
1 � 2 C DC 2, 2 -1, 3D 3, -1 0, 0
�� ��Q Can each player deduce the rival’s action through the realized payoff
(and her own action) ?
If this is the case indeed, then observation cannot be imperfect...
22 / 36
Example | Prisoner’s Dilemma (2)
Player i’s payoff in each period depends only on her own action,ai ∈ {C,D} and the public signal, y ∈ {g, b}, i.e., u∗i (y, ai).
Table: Ex post Payoffs u∗i (y, ai)
i � y g b
C3− p− 2q
p− q−p + 2q
p− q
D3(1− r)q − r
− 3r
q − r
p, q, r (0 < q, r < p < 1) are conditional probabilities that g realizes:
p = Pr{g|CC}, q = Pr{g|DC} = Pr{g|CD}, r = Pr{g|DD}.
23 / 36
Example | Prisoner’s Dilemma (3)
To achieve cooperation, consider the (modified) trigger strategies:
Play (C,C) in the first period.
Continue to play (C,C) as long as g keeps realized.
Play (D,D) forever once b is realized.
The above trigger strategies constitute an SPNE if and only if thefollowing condition is satisfied:
δ(3p− 2q) ≥ 1 ⇐⇒ δ ≥ 13p− 2q
(7.2.4 in MS)
Then, symmetric equilibrium (average) payoff becomes2(1− δ)1− δp
, which
converges 0 as δ goes to 1.
24 / 36
General Model (1)
n (long-lived) players engage in an infinitely repeated game with discretetime horizon (t = 0, 1, . . .∞) whose stage game is defined as follows:
ai ∈ Ai: Player i’s action (Ai is assumed finite)
y ∈ Y : Public signal realizes at the end of each period (Y is finite)
ρ(y|a): Conditional probability function (assuming full-support)
ρ(y|α): Extension to mixed action profile α ∈ Πni=1∆(Ai)
Πi(α−i) := ρ(·|·, α−i): |Ai| × |Y | matrix.
u∗i (y, ai): Player i’s ex post payoff
ui(a): Player i’s ex ante payoff, expressed by
ui(a) =∑y∈Y
u∗i (y, ai)ρ(y|a) (7.1.1 in MS)
V (δ): Set of equilibrium (PPE, defined later) payoff under δ
25 / 36
General Model (2)
In the repeated game (of imperfect public monitoring), the only publicinformation available in period t is the t-period history of public signals:
ht := (y0, y1, . . . , yt−1).
The set of public histories is (Y 0 is empty, note h0 is not well-defined):
H := ∪∞t=0Yt
A history for player i includes both the public history and the history ofactions that i has taken:
hti := (y0, a0
i ; y1, a1
i ; . . . ; yt−1, at−1
i ).
The set of histories for player i is ((Y, Ai)0 is empty):
Hi := ∪∞t=0(Ai × Y )t
26 / 36
Perfect Public Equilibrium (1)
A pure strategy for player i is a mapping from all possible histories intothe set of pure actions,
σi : Hi → Ai.
A mixed strategy is a mixture over pure strategies.
A behavior strategy is a mapping
σi : Hi → ∆(Ai).
Definition 13 (Def 7.1.1)
A behavior strategy σi is public if, in every period t, it depends only onthe public history ht ∈ Y t and not on i’s private history. That is, for allht
i, hti ∈ Hi satisfying yτ = yτ for all τ ≤ t− 1,
σi(hti) = σi(ht
i).
A behavior strategy σi is private if it is not public.
27 / 36
Perfect Public Equilibrium (2)
Definition 14 (Def 7.1.2)
Suppose Ai = Aj for all i and j. A public profile σ is stronglysymmetric if, for all public histories ht, σi(ht) = σj(ht) for all i and j.
Definition 15 (Def 7.1.3)
A perfect public equilibrium (PPE) is a profile of public strategies σthat for any public history ht, specifies a Nash equilibrium for therepeated game. A PPE is strict if each player strictly prefers hisequilibrium strategy to every other public strategy.
Lemma 16 (Lemma 7.1.1)
If all players other than i are playing a public strategy, then player i has apublic strategy as a best reply.
Therefore, every PPE is a sequential equilibrium.
28 / 36
Dynamic Programming Approach
1 Decomposition
Transforming a dynamic game into a static game.In so doing, recursive structure and unimprovability play key roles.
2 Self-Generation
Useful property to characterize the set of equilibrium (PPE) payoffs.Without (explicitly) solving a game, the set of equilibrium payoffscan be fully and computationally identified.
29 / 36
Decomposition — Perfect Monitoring
A continuation payoff can be decomposed by a current period payoff andfuture payoffs of the repeated game starting from the next period:
vi = (1− δ)ui(a) + δγi(a) (1)
where γ : A → V (δ) (⊂ Rn) assigns an equilibrium payoff vector to eachaction profile and γi is i’s element (i’s assigned payoff).
Theorem 17
v is supported (as an average payoff) by an SPNE if and only if thereexist a mixed action profile α ∈ ∆(A) and γ : ∆(A) → V (δ) such that
∀i ∀a′i ∈ Ai vi(α) = (1− δ)ui(α) + δγi(α)≥ (1− δ)ui(a′i, α−i) + δγi(a′i, α−i)
30 / 36
Decomposition — Imperfect Monitoring
A continuation payoff can be decomposed by a current period payoff andfuture payoffs of the repeated game starting from the next period:
vi = (1− δ)ui(a) + δ∑y∈Y
γi(y)ρ(y|a) (2)
where γ : Y → V (δ) (⊂ Rn) assigns an equilibrium (PPE) payoff vectorto each public signal and γi is i’s element (i’s assigned payoff).
Theorem 18
v is supported (as an average payoff) by a PPE if and only if there exist amixed action profile α ∈ ∆(A) and γ : ∆(A) → V (δ) such that
∀i ∀a′i ∈ Ai vi(α) = (1− δ)ui(α) + δ∑y∈Y
γi(y)ρ(y|α)
≥ (1− δ)ui(a′i, α−i) + δ∑y∈Y
γi(y)ρ(y|a′i, α−i)
31 / 36
Self-Generation (1)
What happens if the range of the mapping γ, V (δ) is replaced with anarbitrary set W (⊂ Rn) ?
Definition 19
Let B(W ) be a set of vector w = (w1, . . . , wn) if there exist a mixedaction profile α ∈ ∆(A) and γ : ∆(A) → W such that
∀i ∀a′i ∈ Ai wi(α) = (1− δ)ui(α) + δ∑y∈Y
γi(y)ρ(y|α)
≥ (1− δ)ui(a′i, α−i) + δ∑y∈Y
γi(y)ρ(y|a′i, α−i)
W is called self-generating (or self-enforceable) if W ⊆ B(W ).
32 / 36
Self-Generation (2)
Theorem 20
The set of average payoffs in PPE is the fixed point of mapping B(·).
Theorem 21
If W ⊆ W ′, then B(W ) ⊆ B(W ′) must be satisfied.
Theorem 22
If W is self-generating, then the following holds:
W ⊆∞⋃
t=1
Bt(W ) ⊆ V (δ) (3)
If W is bounded and V (δ) ⊂ W , then
∞⋂t=1
Bt(W ) = V (δ) (4)
33 / 36
Folk Theorem by FLM (1994) (1)
Definition 23
The profile α has individual full rank for player i if Πi(α−i) has rankequal to |Ai|, that is, the |Ai| vectors {ρ(·|ai, α−i)}ai∈Ai are linearlyindependent. If this is so for every player i, α has individual full rank.
Note that if α has individual full rank, the number of observableoutcomes |Y | must be at least maxi |Ai|.
Definition 24
Profile α is pairwise-identifiable for players i and j if the rank of matrixΠij(α) equals rank Πi(α−i) + Πj(α−j)− 1.
Definition 25
Profile α has pairwise full rank for players i and j if the matrix Πij(α)has rank |Ai|+ |Aj | − 1.
34 / 36
Folk Theorem by FLM (1994) (2)
Pairwise full rank on α (for players i and j) is actually the conjunctionof two weaker conditions, individual full rank andpairwise-identifiablity (on α for i and j).
1 Pairwise full rank obviously implies individual full rank: incentivescan be designed to induce a player to choose a given action.
2 It also ensures pairwise-identifiablity: deviations by players i and jare distinct in the sense that they induce different probabilitydistributions over public outcomes.
3 Thus, player i’s incentives can be designed without interfering withthose of player j.
35 / 36
Folk Theorem by FLM (1994) (3)
Theorem 26
Suppose that every pure action profile a has individual full rank andeither (i) for all pairs i and j, there exists a mixed action profile α thathas pairwise full rank for that pair, or (ii) every pure-action,Pareto-efficient profile is pairwise-identifiable for all pairs of players,holds. Let W be a smooth subset in the interior of V ∗. Then there existsδ < 1 such that, for all δ > δ, W ⊆ E(δ), i.e., each point in Wcorresponds to a perfect public equilibrium payoff with discount factor δ.
The theorem applies only to interior points and so do not pertain topayoffs on the efficient frontier.
This contrasts with the standard Folk Theorem for observableactions, in which efficient payoffs can be exactly attained.
36 / 36