Repeated Games - Institute for Fiscal Studies Bae University College London [email protected]...

centre for microdata methods and practice

www.cemmap.ac.uk

Repeated Games

George Mailath (Pennsylvania)

17 - 18 November 2016

Institute for Fiscal Studies7 Ridgmount St London WC1E 7AE

cemmap masterclass

Repeated Games

George Mailath

17 – 18 November 2016, IFS Conference Room

Programme Day One: Thursday 17th November 10.30 – 11.00: Registration and coffee 11.00 – 12.30: The basic structure of repeated games under perfect

monitoring 12.30 – 13.30: Lunch 13.30 – 15.00: Repeated games with imperfect public monitoring 15.00 – 15.15: Coffee 15.15 – 16.45: Repeated games with private monitoring Day Two: Friday 18th November 08.30 – 09.00: Coffee 09.00 – 10.30: The basic reputation argument 10.30 – 10.45: Coffee 10.45 – 12.15: The Canonical Reputation Model and Reputation Effects 12.15: Lunch and Close

Repeated Games (George Mailath) 17 – 18 November, Institute for Fiscal Studies Jihyun Bae University College London [email protected]

Guo Bai University College London [email protected]

Giulia Conte University College London [email protected]

Alfred Duncan University of Kent [email protected]

Carlo Galli University College London [email protected]

Luke Garrod Loughborough University [email protected]

Amir Habibi University College London [email protected]

Jan Jozwik University of Birmingham [email protected]

Gavin Kader University College London [email protected]

Maia King Queen Mary - University of

London [email protected]

Pei Kuang University of Birmingham [email protected]

Krittanai Laohakunakorn London School of Economics and

Political Science [email protected]

Konrad Mierendorff University College London [email protected]

Zahra Murad University of Surrey [email protected]

Mateusz Mysliwski University College London [email protected]

Lars Nesheim University College London [email protected]

Masato Nishiwaki Waseda University [email protected]

Matthew Olczak Aston University [email protected]

Ramakanta Patra Cardiff Metropolitan University [email protected]

Ruben Poblete University College London [email protected]

Manzur Rashid University College London [email protected]

Pablo Slon-Montero University of Kent [email protected]

Carolyn St Aubyn Birkbeck - University of London [email protected]

Alessia Testa University of Portsmouth [email protected]

Yiming Xia University College London [email protected]


www.cemmap.ac.uk

Lecture Notes

Economics 703: Microeconomics IIModelling Strategic Behavior1

George J. Mailath

Department of Economics

University of Pennsylvania

November 8, 2016

1Copyright November 8, 2016 by George J. Mailath.

Chapter 7

Repeated Games1

7.1 Basic Structure

Stage game G ≡ (Ai, ui):Action space for i is Ai, with typical action ai ∈ Ai. An action

profile is denoted a = (a1, . . . , an).Discount factor δ ∈ (0,1).Play G at each date t = 0,1, . . . .At the end of each period, all players observe the action profile a

chosen. Actions of every player are perfectly monitored by all otherplayers.

History up to date t: ht ≡ (a0, . . . , at−1) ∈ At ≡ Ht; H0 ≡ ∅.Set of all possible histories: H ≡ ∪∞t=0Ht.Strategy for player i is denoted si : H → Ai. Often written si =

(s0i , s

1i , s

2i , . . .), where sti : Ht → Ai. Since H0 = ∅, we have s0 ∈ A,

and so can write a0 for s0. The set of all strategies for player i is Si.Note distinction between

• actions ai ∈ Ai and

• strategies si : H → Ai.

Given strategy profile s ≡ (s1, s2, . . . , sn), outcome path induced by

1Copyright November 8, 2016 by George J. Mailath.

161

162 CHAPTER 7. REPEATED GAMES

s is a(s) = (a0(s), a1(s), a2(s), . . .), where

a0(s) =(s1(∅), s2(∅), . . . , sn(∅)),a1(s) =(s1(a0(s)), s2(a0(s)), . . . , sn(a0(s))),a2(s) =(s1(a0(s), a1(s)), s2(a0(s), a1(s)), . . . , sn(a0(s), a1(s))),

...

Payoffs of Gδ(∞) are

Uδi (s) = (1− δ)∞∑t=0

δtui(at(s)).

We have now described a normal form game: Gδ(∞) = (Si, Uδi )ni=1.Note that the payoffs of this normal form game are convex com-binations of the stage game payoffs, and so the set of payoffs in(Si, ui) trivially strictly contains u(A) := v ∈ Rn : ∃a ∈ A,v =u(a) and is a subset of the convex hull of u(A), convu(A), i.e.,

u(A) ⊊ Uδ(S) ⊆ convu(A).

Moreover, for δ sufficiently close to 1, every payoff in convu(A) canbe achieved as a payoff in Gδ(∞). More specifically, if δ > 1−1/|A|,then for all v ∈ convu(A) there exists s ∈ S such that v = Uδ(s)(see Lemma 1 in Fudenberg and Maskin, 1991).

Definition 7.1.1. The set of feasible payoffs for the infinitely re-peated game is given by

convu(A) = convv ∈ Rn : ∃a ∈ S,v = u(a).

While the superscript δ is sometimes omitted from Gδ(∞) andfrom Uδi , this should not cause confusion, as long as the role of thediscount factor in determining payoffs is not forgotten.

Definition 7.1.2. Player i’s pure strategy minmax utility is

¯vpi =min

a−imaxaiui(ai, a−i).

November 8, 2016 163

E S

E 2,2 −1,3

S 3,−1 0,0

u2

1

2

3

0

1

1 2 31 u1

Fp

Figure 7.1.1: A prisoners’ dilemma on the left, and the set of payoffs onthe right. Both players’ minmax payoff is 0. The set of fea-sible payoffs is the union of the two lightly shaded regionsand Fp∗, the darkly shaded region.

The profile a−i ∈ arg mina−i maxai ui(ai, a−i) minmaxes playeri. The set of (pure strategy) strictly individually rational payoffs in(Si, ui) is v ∈ Rn : vi >

¯vpi . Define Fp∗ ≡ v ∈ Rn : vi >

¯vpi ∩ convv ∈ Rn : ∃a ∈ S,v = u(a). The set is illustrated inFigure 7.1.1.

Definition 7.1.3. The strategy profile s is a Nash equilibrium ofGδ(∞) if, for all i and all si : H → Ai,

Uδi (si, s−i) ≥ Uδi (si, s−i).Theorem 7.1.1. Suppose s∗ is a pure strategy Nash equilibrium.Then,

Uδi (s∗) ≥

¯vpi .

Proof. Let si be a strategy satisfying

si(ht) ∈ arg maxai

ui(ai, s∗−i(ht)), ∀ht ∈ Ht

(if the arg max is unique for some history ht, s(i(ht) is uniquelydetermined, otherwise make a selection from the argmax). Since

Uδi (s∗) ≥ Uδi (si, s∗−i),


and since in every period¯vpi is a lower bound for the flow payoff

received under the profile (si, s∗−i), we have

Uδi (s∗) ≥ Uδi (si, s∗−i) ≥ (1− δ)

∞∑t=0

δt¯vpi =

¯vpi .

Remark 7.1.1. In some settings it is necessary to allow players torandomize. For example, in matching pennies, the set of pure strat-egy feasible and individually rational payoffs is empty.

Definition 7.1.4. Player i’s mixed strategy minmax utility is

¯vi = min

α−i∈∏j≠i∆(Aj)

maxαi∈∆(Ai)

ui(αi, α−i).

The profile α−i ∈ arg minα−i maxαi ui(αi, α−i) minmaxes playeri. The set of (mixed strategy) strictly individually rational payoffsin (Si, ui) is v ∈ Rn : vi >

¯vi. Define F∗ ≡ v ∈ Rn : vi >

¯vi ∩ convv ∈ Rn : ∃a ∈ S,v = u(a).

The Minmax Theorem (Problem 4.3.2) implies that¯vi is i’s secu-

rity level (Definition 2.4.2).A proof essentially identical to that proving Theorem 7.1.1 (ap-

plied to the behavior strategy profile realization equivalent to σ∗)proves the following:

Theorem 7.1.2. Suppose σ∗ is a (possibly mixed) Nash equilibrium.Then,

Uδi (σ∗) ≥

¯vi.

Since¯vi ≤

¯vPi (with a strict inequality in some games, such

as matching pennies), lower payoffs often can be enforced usingmixed strategies. The possibility of enforcing lower payoffs allowshigher payoffs to be enforced in subgame perfect equilibria.

Given ht = (a0, . . . , at−1) ∈ Ht and hτ = (a0, . . . , aτ) ∈ Hτ , thehistory (a0, . . . , at−1, a0, . . . , aτ) ∈ Ht′+t is the concatenation of ht


followed by hτ , denoted by (ht, hτ). Given si, define si|ht : H → Aias follows:

si|ht (hτ) = si(ht, hτ).Note that for all histories ht,

si|ht∈ Si.Remark 7.1.2. In particular, the subgame reached by any history htis an infinitely repeated game that is strategically equivalent to theoriginal infinitely repeated game, Gδ(∞).

Definition 7.1.5. The strategy profile s is a subgame perfect equi-librium of Gδ(∞) if, for all histories, ht ∈ Ht, s|ht= (si|ht , . . . , sn|ht)is a Nash equilibrium of Gδ(∞).Example 7.1.1 (Grim trigger in the repeated PD). Consider the pris-oners’ dilemma in Figure 7.1.1.

A grim trigger strategy profile is a profile where a deviation trig-gers Nash reversion (hence trigger) and the Nash equilibrium min-maxes the players (hence grim). For the prisoners’ dilemma, grimtrigger can be described as follows: player i’s strategy is given by

si(∅) =E,

and for t ≥ 1,

si(a0, . . . , at−1) =E, if at′ = EE for all t′ = 0,1, . . . , t − 1,S, otherwise.

Payoff to 1 from (s1, s2) is: (1− δ)∑2× δt = 2.Consider a deviation by player 1 to another strategy s1. In re-

sponse to the first play of S, player 2 responds with S in everysubsequent period, so player 1 can do no better than always playS after the first play of S. The maximum payoff from deviating inperiod t = 0 (the most profitable deviation) is (1− δ)3. The profileis Nash if

2 ≥ 3(1− δ)⇐⇒ δ ≥ 13.


Strategy profile is subgame perfect: Note first that the profileonly induces two different strategy profiles after any history. De-note by s† = (s†1 , s†2) the profile in which each player plays S for allhistories, s†i (ht) = S for all ht ∈ H. Then,2

si|(a0,...,at−1)=si, if at′ = EE for all t′ = 0,1, . . . , t − 1,

s†i , otherwise.

We have already verified that s is an Nash equilibrium of G(∞), andit is immediate that s† is Nash. «

Grim trigger is an example of a strongly symmetric strategy pro-file (the deviator is not treated differently than the other player(s)):

Definition 7.1.6. Suppose Ai = Aj for all i and j. A strategy profileis strongly symmetric if

si(ht) = sj(ht) ∀ht ∈ H,∀i, j.Represent strategy profiles by automata, (W ,w0, f , τ), where

• W is set of states,

• w0 is initial state,

• f :W → A is output function (decision rule),3 and

• τ :W ×A→W is transition function.

Any automaton (W ,w0, f , τ) induces a pure strategy profile asfollows: First, extend the transition function from the domainW ×A to the domainW × (H\∅) by recursively defining

τ(w,h t) = τ (τ(w,h t−1), a t−1). (7.1.1)

2This is a statement about the strategies as functions, i.e., for all hτ ∈ H,

si|(a0,...,at−1) (hτ) =si(hτ), if at′ = EE for all t′ = 0,1, . . . , t − 1,

s†i (hτ), otherwise.

3A profile of behavior strategies (b1, . . . , bn), bi : H → ∆(Ai), can also berepresented by an automaton. The output function now maps into profiles ofmixtures over action profiles, i.e., f :W →∏

i∆(Ai).


wEEw0 wSSES, SE, SS

EE EE, SE, ES, SS

Figure 7.1.2: The automaton from Example 7.1.2.

With this definition, the strategy s induced by the automaton isgiven by s(∅) = f(w0) and

s(ht) = f(τ(w0, ht)),∀ht ∈ H\∅.Conversely, it is straightforward that any strategy profile can be

represented by an automaton. Take the set of histories H as the setof states, the null history ∅ as the initial state, f(ht) = s(ht), andτ(h t, a

) = h t+1, where h t+1 ≡ (h t, a

)is the concatenation of the

history h t with the action profile a.A strongly symmetric automaton has fi(w) = fj(w) for all w

and all i and j.This representation leaves us in the position of working with

the full set of histories H. However, strategy profiles can often berepresented by automata with finite sets W . The set W is thena partition on H, grouping together those histories that promptidentical continuation strategies.

Advantage of automaton representation clearest whenW can bechosen finite, but also has conceptual advantages.

Example 7.1.2 (Grim trigger in the repeated PD, cont.).Grim trigger profile has automata representation, (W ,w0, f , τ),

with W = wEE,wSS, w0 = wEE, f(wEE) = EE and f(wSS) = SS,and

τ(w,a) =wEE, if w = wEE and a = EE,wSS , otherwise.

The automaton is illustrated in Figure 7.1.2. Note the notation con-vention: the subscript on the state indicates the action profile spec-ified by the output function (i.e., f(wa) = a, and we may have


f(wa) = a and f(wa) = a); it is distinct from the transition func-tion. «

If s is represented by (W ,w0, f , τ), the continuation strategyprofile after a history ht, s |ht is represented by the automaton(W , τ(w0, ht), f , τ), where τ(w0, ht) is given by (7.1.1).

Definition 7.1.7. The state w ∈ W of an automaton (W ,w0, f , τ)is reachable from w0 if there exists a history ht ∈ H such that

w = τ(w0, ht).

Denote the set of states reachable from w0 byW(w0).

Lemma 7.1.1. The strategy profile with representing automaton (W ,w0, f , τ)is a subgame perfect equilibrium iff for all states w ∈ W(w0), thestrategy profile represented by (W ,w, f , τ) is a Nash equilibrium ofthe repeated game.

Given an automaton (W ,w0, f , τ), let Vi(w) be i’s value frombeing in the state w ∈W , i.e.,

Vi(w) = (1− δ)ui(f (w))+ δVi(τ(w, f(w))).Note that ifW is finite, Vi solves a finite set of linear equations (seeProblem 7.6.3).

It is an implication of Theorem 7.1.1 and Lemma 7.1.1 that if(W ,w0, f , τ) represents a pure strategy subgame perfect equilib-rium, then for all states w ∈W , and all i,

Vi(w) ≥¯vpi

Compare the following definition with Definition 5.1.3, and theproofs of Theorem 7.1.3 with that of Theorem 5.3.2.

Definition 7.1.8. Player i has a profitable one-shot deviation from(W ,w0, f , τ), if there is some state w ∈ W(w0) and some actionai ∈ Ai such that

Vi(w) < (1− δ)ui(ai, f−i(w))+ δVi(τ(w, (ai, f−i(w))).


Another instance of the one-shot deviation principle (recall The-orems 5.1.1 and 5.3.2):

Theorem 7.1.3. A strategy profile is subgame perfect iff there areno profitable one-shot deviations.

Proof. Clearly, if a strategy profile is subgame perfect, then thereare no profitable deviations.

We need to argue that if a profile is not subgame perfect, thenthere is a profitable one-shot deviation.

Suppose (s1, . . . , sn) (with representing automaton (W ,w0, f , τ))is not subgame perfect. Then there exists some history ht′ andplayer i such that si|ht′ is not a best reply to s−i|ht′ . That is, thereexists si such that

0 < Ui(si, s−i|ht′)−Ui(si|ht′ , s−i|ht′) ≡ ε.For simplicity, define sj = sj |ht′ . Defining M ≡ 2 maxi,a |ui(a)|,suppose T is large enough so that δTM < ε/2, and consider thestrategy for i defined by

si(ht) =si(ht), t < T ,si(ht), t ≥ T .

Then,|Ui(si, s−i)−Ui(si, s−i)| ≤ δTM < ε/2,

so that

Ui(si, s−i|ht′)−Ui(si|ht′ , s−i|ht′) = Ui(si, s−i)−Ui(si, s−i) > ε/2 > 0.

Note that si is a profitable “T -period” deviation from si.Let hT−1 be the history up to and including period T −1 induced

by (si, s−i), and let w = τ(w0, ht′hT−1). Note that

Ui(si|hT−1, s−i|hT−1) = Vi(τ(w0, ht′hT−1)) = Vi(w)and

Ui(si|hT−1, s−i|hT−1)= (1− δ)ui(si(hT−1), f−i(w))+ δVi(τ(w, (si(hT−1), f−i(w)))).


Hence, if Ui(si|hT−1, s−i|hT−1) > Ui(si|hT−1, s−i|hT−1), then we are done,since player i has a profitable one-shot deviation from (W ,w0, f , τ).

Suppose not, i.e., Ui(si|hT−1, s−i|hT−1) ≤ Ui(si|hT−1, s−i|hT−1). For thestrategy si defined by

si(ht) =si(ht), t < T − 1,si(ht), t ≥ T − 1,

we have

Ui(si, s−i) =(1− δ)T−2∑t=0

δtui(at(si, s−i))+ δT−1Ui(si|hT−1, s−i|hT−1)

≥(1− δ)T−2∑t=0

δtui(at(si, s−i))+ δT−1Ui(si|hT−1, s−i|hT−1)

=Ui(si, s−i) > Ui(si, s−i).That is, the (T−1)-period deviation is profitable. But then either theone-shot deviation in period T − 1 is profitable, or the (T − 2)-shotdeviation is profitable. Induction completes the argument.

Note that a strategy profile can have no profitable one-shotdeviations on the path of play, and yet not be Nash, see Ex-ample 7.1.4/Problem 7.6.5 for a simple example.

See Problem 7.6.4 for an alternative (and perhaps more enlight-ening) proof.

Corollary 7.1.1. Suppose the strategy profile s is represented by(W ,w0, f , τ). Then s is subgame perfect if, and only if, for allw ∈ W(w0), f(w) is a Nash eq of the normal form game withpayoff function gw : A→ Rn, where

gwi (a) = (1− δ)ui(a)+ δVi(τ(w,a)).Example 7.1.3 (continuation of grim trigger). We clearly have V1(wEE) =2 and V1(wSS) = 0, so that the normal form associated with wEE is

E S

E 2,2 −(1− δ),3(1− δ)S 3(1− δ),−(1− δ) 0,0

,


while the normal form for wSS is

E S

E 2(1− δ),2(1− δ) −(1− δ),3(1− δ)S 3(1− δ),−(1− δ) 0,0

.

As required, EE is a (but not the only!) Nash eq of the wEE normalform, while SS is a Nash eq of the wSS normal form. «

Example 7.1.4. Stage game:

A B C

A 4,4 3,2 1,1

B 2,3 2,2 1,1

C 1,1 1,1 −1,−1

Stage game has a unique Nash eq: AA. Suppose δ ≥ 2/3. Thenthere is a subgame perfect equilibrium of G(∞) with outcome path(BB)∞: (W ,w0, f , τ), where W = wBB,wCC, w0 = wBB, fi(wa) =ai, and

τ(w,a) =wBB, if w = wBB and a = BB, or w = wCC and a = CC ,

wCC , otherwise.

The automaton is illustrated in Figure 7.1.3.Values of the states are

Vi(wBB) =(1− δ)2+ δVi(wBB),and Vi(wCC) =(1− δ)× (−1)+ δVi(wBB).

Solving,

Vi(wBB) =2,and Vi(wCC) =3δ− 1.


wBBw0 wCC

¬BB

CC

BB

¬CC

Figure 7.1.3: The automaton for Example 7.1.4.

Player 1’s payoffs in the normal form associated with wBB are

A B C

A 4(1− δ)+ δ(3δ− 1) 3(1− δ)+ δ(3δ− 1) 1− δ+ δ(3δ− 1)

B 2(1− δ)+ δ(3δ− 1) 2 1− δ+ δ(3δ− 1)

C 1− δ+ δ(3δ− 1) 1− δ+ δ(3δ− 1) −(1− δ)+ δ(3δ− 1)

,

and since the game is symmetric, BB is a Nash eq of this normalform only if

2 ≥ 3(1− δ)+ δ(3δ− 1),i.e.,

0 ≥ 1− 4δ+ 3δ2 a 0 ≥ (1− δ)(1− 3δ),or δ ≥ 1/3.

Player 1’s payoffs in the normal form associated with wCC are

A B C

A 4(1− δ)+ δ(3δ− 1) 3(1− δ)+ δ(3δ− 1) 1− δ+ δ(3δ− 1)

B 2(1− δ)+ δ(3δ− 1) 2(1− δ)+ δ(3δ− 1) 1− δ+ δ(3δ− 1)

C 1− δ+ δ(3δ− 1) 1− δ+ δ(3δ− 1) −(1− δ)+ δ2

,

and since the game is symmetric, CC is a Nash eq of this normalform only if

−(1− δ)+ δ2 ≥ 1− δ+ δ(3δ− 1),i.e.,

0 ≥ 2− 5δ+ 3δ2 a 0 ≥ (1− δ)(2− 3δ),


wHcw0 wLsLc, Ls

Hc,Hs

Figure 7.2.1: The automaton for the equilibrium in Example 7.2.1.

or δ ≥ 2/3. This completes our verification that the profile repre-sented by the automaton in Figure 7.1.3 is subgame perfect.

Note that even though there is no profitable deviation at wBB ifδ ≥ 1/3, the profile is only subgame perfect if δ ≥ 2/3. Moreover,the profile is not Nash if δ ∈ [1/3, 1/2) (Problem 7.6.5). «

7.2 Modeling Competitive Agents (Small/Short-Lived Players)

Example 7.2.1 (Product-choice game).

c s

H 3,3 0,2

L 4,0 2,1

Player I (row player) is long-lived, choosing high (H) or low (L) ef-fort, player II (the column player) is short lived, choosing the cus-tomized (c) or standardized (s) product.

Subgame perfect equilibrium described by the two state automa-ton given in Figure 7.2.1. The action profile Ls is a static Nash equi-librium, and since wLs is an absorbing state, we trivially have thatLs is a Nash equilibrium of the associated one-shot game, gwLs .

Note that V1(wHc) = 3 and V1(wLs) = 2. Since player 2 is short-lived, he must myopically optimize in each period. The one-shotgame from Corollary 7.1.1 has only one player. The one-shot game


wHcw0 wLs

Lc, Ls

Lc, Ls

Hc, Hs

Hc, Hs

Figure 7.2.2: The automaton for the equilibrium in Example 7.2.2.

gwHc associated with wHc is given by

c

H (1− δ)3+ δ3

L (1− δ)4+ δ2

and player I finds H optimal if 3 ≥ 4− 2δ, i.e., if δ ≥ 1/2.Thus the profile is a subgame perfect equilibrium if, and only if,

δ ≥ 1/2.«

Example 7.2.2.

c s

H 3,3 2,2

L 4,0 0,1

The action profile Ls is no longer a static Nash equilibrium, andso Nash reversion cannot be used to discipline player I’s behavior.

Subgame perfect equilibrium described by the two state automa-ton in Figure 7.2.2.

Since player 2 is short-lived, he must myopically optimize ineach period, and he is.

Note that V1(wHc) = 3 and V1(wLs) = (1− δ)0+ δ3 = 3δ. Thereare two one shot games we need to consider. The one-shot game


gwHc associated with wHc is given by

c

H (1− δ)3+ δ3

L (1− δ)4+ 3δ2

and player I finds H optimal if 3 ≥ 4− 4δ+ 3δ2 a 0 ≥ (1− δ)(1−3δ)a δ ≥ 1/3.

The one-shot game gwLs associated with wLs is given by

s

H (1− δ)2+ 3δ2

L (1− δ)0+ 3δ

and player I finds L optimal if 3δ ≥ 2− 2δ+ 3δ2 a 0 ≥ (1−δ)(2−3δ))a δ ≥ 2/3.

Thus the profile is a subgame perfect equilibrium if, and only if,δ ≥ 2/3. «

Example 7.2.3. Stage game: Seller chooses quality, “H” or “L”, andannounces price.

Cost of producing H = cH = 2.Cost of producing L = cL = 1.Demand:

x(p) =

10− p, if H, and

4− p, if L.

If L, maxp(4− p)(p − cL)⇒ p = 52 ⇒ x = 3

2 , πL = 9

4 .If H, maxp(10− p)(p − cH)⇒ p = 6⇒ x = 4, πH = 16.Quality is only observed after purchase.Model as a game: Strategy space for seller (H,p), (L,p′) : p,p′ ∈

R+.Continuum of (long-lived) consumers of mass 10, each consumer

buys zero or one unit of good. Consumer i ∈ [0,10] values one unitof good as follows

vi =i, if H, and

max0, i− 6, if L.


Strategy space for consumer i is s : R+ → 0,1, where 1 is buyand 0 is not buy.

Strategy profile is ((Q,p), ξ), where ξ(i) is consumer i’s strat-egy. Write ξi for ξ(i). Consumer i’s payoff function is

ui((Q,p), ξ) =

i− p, if Q = H and ξi(p) = 1,

max0, i− 6 − p, if Q = L and ξi(p) = 1, and

0, if ξi(p) = 0.

Firm’s payoff function is

π((Q,p), ξ) =(p − cQ)x(p, ξ)

≡(p − cQ)∫ 10

0ξi(p)di

=(p − cQ)λi ∈ [0,10] : ξi(p) = 1,where λ is Lebesgue measure. [Note that we need to assume that ξis measurable.]

Assume firm only observes x(p, ξ) at the end of the period, sothat consumers are anonymous.

Note that x(p, ξ) is independent of Q, and that the choice (L,p)strictly dominates (H,p) whenever x(p, ξ) ≠ 0.

If consumer i believes the firm has chosen Q, then i’s best re-sponse to p is ξi(p) = 1 only if ui((Q,p), ξ) ≥ 0. Let ξQi (p) denotethe maximizing choice of consumer i when the consumer observesprice p and believes the firm also chose quality Q. Then,

ξHi (p) =

1, if i ≥ p, and

0, if i < p,

so x(p, ξH) = ∫ 10p di = 10− p. Also,

ξLi (p) =

1, if i ≥ p + 6, and

0, if i < p + 6,

so x(p, ξL) = ∫ 10p+6di = 10− (p + 6) = 4− p.

Unique subgame perfect equilibrium of stage game is ((L, 52), ξ

L).


wHw0 wL(Lp,x)

(Hp,x)

(Qp,x)

Figure 7.2.3: Grim trigger in the quality game. Note that the transitionsare only a function of Q.

Why isn’t the outcome path ((H,6), ξH(6)) consistent with sub-game perfection? Note that there are two distinct deviations bythe firm to consider: an unobserved deviation to (L,6), and an ob-served deviation involving a price different from 6. In order to deteran observed deviation, we specify that consumer’s believe that, inresponse to any price different from 6, the firm had chosen Q = L,leading to the best response ξi given by

ξi(p) =

1, if p = 6 and i ≥ p, or p ≠ 6 and i ≥ p + 6,

0, otherwise,

implying aggregate demand

x(p, ξ) =

4, if p = 6,

max0,4− p, p ≠ 6.

Clearly, this implies that observable deviations by the firm are notprofitable. Consider then the profile ((H,6), ξ): the unobserveddeviation to (L,6) is profitable, since profits in this case are (10 −6)(6 − 1) = 20 > 16. Note that for the deviation to be profitable,firm must still charge 6 (not the best response to ξH).

Eq with high quality: buyers believe H will be produced as longas H has been produced in the past. If ever L is produced, then Lis expected to always be produced in future. See Figure 7.2.3.

It only remains to specify the decision rules:

f1(w) =(H,6), if w = wH , and

(L, 52), if w = wL.


and

f2(w) =ξ, if w = wH , and

ξL, if w = wL.Since the transitions are independent of price, the firm’s price is

myopically optimal in each state.Since the consumers are small and myopically optimizing, in

order to show that the profile is subgame perfect, it remains toverify that the firm is behaving optimally in each state. The firmvalue in each state is V1(wQ) = πQ. Trivially, L is optimal in wL.Turning to wH , we have

(1− δ)20+ δ94≤ 16 a δ ≥ 16

71.

There are many other equilibria. «

Remark 7.2.1 (Short-lived player). We can model the above as agame between one long-lived player and one short-lived player. Inthe stage game, the firm chooses p, and then the firm and con-sumer simultaneously choose quality Q ∈ L,H, and quantityx ∈ [0,10]. If the good is high quality, the consumer receives autility of 10x − x2/2 from consuming x units. If the good is oflow quality, his utility is reduced by 6 per unit, giving a utility of4x−x2/2.4 The consumer’s utility is linear in money, so his payoffsare

uc(Q,p) =(4− p)x − x2

2 , if Q = L, and

(10− p)x − x2

2 , if Q = H.

Since the period t consumer is short-lived (a new consumer replaceshim next period), if he expects L in period t, then his best reply isto choose x = xL(p) ≡ max4 − p,0, while if he expects H, hisbest reply is choose x = xH(p) ≡ max10 − p,0. In other words,his behavior is just like the aggregate behavior of the continuum ofconsumers.

This is in general true: a short-lived player can typically repre-sent a continuum of long-lived anonymous players.

4For x > 4, utility is declining in consumption. This can be avoided by settinghis utility equal to 4x − x2/2 for x ≤ 4, and equal to 8 for all x > 4. This doesnot affect any of the relevant calculations.


7.3 Applications

7.3.1 Efficiency Wages I

Consider an employment relationship between a worker and firm.Within the relationship, (i.e., in the stage game), the worker (playerI) decides whether to exert effort (E) for the firm (player II), or toshirk (S) for the firm. Effort yields output y for sure, while shirkingyields output 0 for sure. The firm chooses a wage w that period. Atthe end of the period, the firm observes output, equivalently effortand the worker observes the wage. The payoffs in the stage gameare given by

uI(aI, aII) =

w− e, if aI = E and aII = w,w, if aI = S and aII = w,

and

uII(aI, aII) =y −w, if aI = E and aII = w,−w, if aI = S and aII = w.

Supposey > e.

Note that the stage game has (S,0) as the unique Nash equilib-rium, with payoffs (0,0). This can also be interpreted as the pay-offs from terminating this relationship (when both players receivea zero outside option).

Grim trigger at the wage w∗ is illustrated in Figure 7.3.1.Grim trigger is an equilibrium if

w∗ − e ≥ (1− δ)w∗ ⇐⇒ δw∗ ≥ e (7.3.1)

and y −w∗ ≥ (1− δ)y ⇐⇒ δy ≥ w∗. (7.3.2)

Combining the worker (7.3.1) and firm (7.3.2) incentive constraintsyields bounds on the equilibrium wage w∗:

eδ≤ w∗ ≤ δy, (7.3.3)


wEw∗w0 wS0Ew : w < w∗ or Sw

Ew : w ≥ w∗

Figure 7.3.1: Grim Trigger for the employment relationship in Section7.3.1. The transition from wEw∗ labelled Ew : w ≥ w∗means any action profile in which the worker exerted effortand the firm paid at least w∗; the other transition from wEw∗

occurs if either the firm underpaid (w < w∗), or the workershirked (S).

Note that both firm and worker must receive a positive surplusfrom the relationship:

w∗ − e ≥ (1− δ)δ

e > 0 (7.3.4)

and y −w∗ ≥ (1− δ)y > 0. (7.3.5)

Inequality (7.3.1) (equivalently, (7.3.4)) can be interpreted as in-dicating that w∗ is an efficiency wage: the wage is strictly higherthan the disutility of effort (if workers are in excess supply, a naivemarket clearing model would suggest a wage of e). We return tothis idea in Section 7.5.1.

Suppose now that there is a labor market where firms and work-ers who terminate one relationship can costlessly form a employ-ment relationship with a new partner (perhaps there is a pool ofunmatched firms and workers who costlessly match).

In particular, new matches are anonymous: it is not possible totreat partners differently on the basis of behavior of past behavior(since that is unobserved).

A specification of behavior is symmetric if all firms follow thesame strategy and all workers follow the same strategy in an em-ployment relationship. To simplify things, suppose also that firmscommit to wage strategy (sequence) at the beginning of each em-ployment relationship. Grim trigger at a constant wage w∗ satisfy-


wE0w0

wEw∗

wN

Sw∗E0

S0

Ew∗

Figure 7.3.2: A symmetric profile for the employment relationship in Sec-tion 7.3.1, where the firm is committed to paying w∗ in everyperiod except the initial period while the relationship lasts(and so transitions are not specified for irrelevant wages).The state wN means start a new relationship; there are notransitions from this state in this relationship (wN corre-sponds to wE0 in a new relationship).

ing (7.3.3) is not a symmetric equilibrium: After a deviation by theworker, the worker has an incentive to terminate the relationshipand start a new one, obtaining the surplus (7.3.4) (as if no deviationhad occurred).

Consider an alternative profile illustrated in Figure 7.3.2. Notethat this profile has the flavor of being “renegotiation-proof.” Thefirm is willing to commit at the beginning of the relationship topaying w∗ in every period (after the initial period, when no wage ispaid) as long as effort is exerted if

y − δw∗ ≥ 0.

The worker has two incentive constraints. In state wE0, the value tothe worker is

−(1− δ)e+ δ(w∗ − e) = δw∗ − e =: VI(wE0).

The worker is clearly willing to exert effort in wE0 if

VI(wE0) ≥ 0× (1− δ)+ δVI(wE0),


that isVI(wE0) ≥ 0.

The worker is willing to exert effort in wEw∗ if

w∗ − e ≥ (1− δ)w∗ + δVI(wE0)= (1− δ+ δ2)w∗ − δe,

which is equivalent

δw∗ ≥ e ⇐⇒ VI(wE0) ≥ 0.

A critical feature of this profile is that the worker must “invest”in the relationship: Before the worker can receive the ongoing sur-plus of the employment relationship, he/she must pay an upfrontcost so the worker does not have an incentive to shirk in the currentrelationship and then restart with a new firm.

If firms must pay the same wage in every period including theinitial period (for example, for legal reasons or social norms), thensome other mechanism is needed to provide the necessary disin-centive to separate. Frictions in the matching process (involuntaryunemployment) is one mechanism. For more on this and relatedideas, see Shapiro and Stiglitz (1984); MacLeod and Malcomson(1989); Carmichael and MacLeod (1997).

7.3.2 Collusion Under Demand Uncertainty

Oligopoly selling perfect substitutes. Demand curve is given by

Q =ω−minp1, p2, . . . , pn,where s is the state of demand and pi is firm i’s price. Demand isevenly divided between the lowest pricing firms. Firms have zeroconstant marginal cost of production.

The stage game has unique Nash equilibrium, in which firmsprice at 0, yielding each firm profits of 0, which is their minmaxpayoff.

In each period, the stateω is an independent and identical drawfrom the finite set Ω, according to the distribution q ∈ ∆(Ω).


The monopoly price is given by pm(ω) := ω/2, with associatedmonopoly profits of ω2/4.

Interested in the strongly symmetric equilibrium in which firmsmaximize expected profits. A profile (s1, s2, . . . , sn) if for all his-tories ht, si(ht) = sj(ht), i.e., after all histories (even asymmetricones where firms have behaved differently), firms choose the sameaction.

Along the equilibrium path, firms set a common price p(ω),and any deviations are punished by perpetual minmax, i.e., grimtrigger.5

Let v∗ be the common expected payoff from such a strategyprofile. The necessary and sufficient conditions for grim trigger tobe an equilibrium are, for each state ω,

1n(1− δ)p(ω)(ω− p(ω))+ δv∗ ≥ sup

p<p(ω)(1− δ)p(ω− p)

= (1− δ)p(ω)(ω− p(ω)),(7.3.6)

where

v∗ = 1n

∑ω′∈Ω

p(ω′)(ω′ − p(ω′))q(ω′).

Inequality (7.3.6) can be written as

p(ω)(ω− p(ω)) ≤ δnv∗

(n− 1)(1− δ)= δ(n− 1)(1− δ)

∑ω′∈Ω

p(ω′)(ω′ − p(ω′))q(ω′).

If there is no uncertainty over states (i.e, there exists ω′ suchthat q(ω′) = 1), this inequality is independent of states (and the

5Strictly speaking, this example is not a repeated game with perfect monitor-ing. An action for firm i in the stage game is a vector (pi(ω))ω. At the endof the period, firms only observe the pricing choices of all firms at the realizedω. Nonetheless, the same theory applies. Subgame perfection is equivalent toone-shot optimality: a profile is subgame perfect if, conditional on each informa-tion set (in particular, conditional on the realized ω), it is optimal to choose thespecified price, given the specified continuation play.


price p(ω)), becoming

1 ≤ δ(n− 1)(1− δ) ⇐⇒

n− 1n

≤ δ. (7.3.7)

Suppose there are two equally likely states, L < H. In order tosupport collusion at the monopoly price in each state, we need :

L2

4≤ δ(n− 1)(1− δ)2

L2

4+ H

2

4

(7.3.8)

andH2

4≤ δ(n− 1)(1− δ)2

L2

4+ H

2

4

. (7.3.9)

Since H2 > L2, the constraint in the high state (7.3.9) is the relevantone, and is equivalent to

2(n− 1)H2

L2 + (2n− 1)H2≤ δ,

a tighter bound than (7.3.7), since the incentive to deviate is un-changed, but the threat of loss of future collusion is smaller.

Supposen− 1n

< δ <2(n− 1)H2

L2 + (2n− 1)H2,

so that colluding on the monopoly price in each state is inconsis-tent with equilibrium.

Since the high state is the state in which the firms have thestrongest incentive to deviate, the most collusive equilibrium setsp(L) = L/2 and p(H) to solve the following incentive constraintwith equality:

p(H)(H − p(H)) ≤ δ(n− 1)(1− δ)2

L2

4+ p(H)(H − p(H))

.

In order to fix ideas, suppose n(1−δ) > 1. Then, this inequalityimplies

p(H)(H − p(H)) ≤ 12

L2

4+ p(H)(H − p(H))

,


that is,

p(H)(H − p(H)) ≤ L2

4= L

2

(L− L

2

).

Note that setting p(H) = L2 violates this inequality (since H > L).

Since profits are strictly concave, this inequality thus requires alower price, that is,

p(H) <L2.

In other words, if there are enough firms colluding (n(1 − δ) >1), collusive pricing is counter-cyclical! This counter-cyclicality ofprices also arises with more states and a less severe requirementon the number of firms (Mailath and Samuelson, 2006, §6.1.1).

7.4 Enforceability, Decomposability, and TheFolk Theorem

Definition 7.4.1. An action profile a′ ∈ A is enforced by the contin-uation promises γ : A → Rn if a′ is a Nash eq of the normal formgame with payoff function gγ : A→ Rn, where

gγi (a) = (1− δ)ui(a)+ δγi(a).A payoff v is decomposable on a set of payoffs V if there exists

an action profile a′ enforced by some continuation promises γ : A→V satisfying, for all i,

vi = (1− δ)ui(a′)+ δγi(a′).Fix a stage game G = (Ai, ui) and discount factor δ. Let

Ep(δ) ⊂ Fp∗ be the set of pure strategy subgame perfect equi-librium payoffs. The notion of enforceability and decompositionplays a central role in the construction of subgame perfect equilib-ria. Problem 7.6.8 asks you to prove the following result.

Theorem 7.4.1. A payoff v ∈ Rn is decomposable on Ep(δ) if, andonly if, v ∈ Ep(δ).

There is a well-know “converse” to Theorem 7.4.1:


Theorem 7.4.2. Suppose every payoff v in some set V ⊂ Rn is de-composable with respect to V . Then, V ⊂ Ep(δ).

The proof of this result can be found in Mailath and Samuel-son (2006) (Theorem 7.4.2 is Proposition 2.5.1). The essential stepsappear in the proof of Lemma 7.4.2.

Any set of payoffs with the property described in Theorem 7.4.2is said to be self-generating. Thus, every set of self-generating pay-off is a set of subgame perfect equilibrium payoffs. Moreover, theset of subgame perfect equilibrium payoffs is the largest set of self-generating payoffs.

The following famous result is Proposition 3.8.1 in Mailath andSamuelson (2006). Example 7.4.1 illustrates the key ideas.

Theorem 7.4.3 (The Folk Theorem). Suppose Ai is finite for all i andF∗ has nonempty interior in Rn. For all v ∈ v ∈ F∗ : ∃v ′ ∈F∗, v ′i < vi∀i, there exists a sufficiently large discount factor δ′,such that for all δ ≥ δ′, there is a subgame perfect equilibrium ofthe infinitely repeated game whose average discounted value is v .

Example 7.4.1 (Symmetric folk theorem for PD). We restrict atten-tion to strongly symmetric strategies, i.e., for all w ∈ W , f1(w) =f2(w). When is (v, v) : v ∈ [0,2] a set of equilibrium payoffs?Since we have restricted attention to strongly symmetric equilibria,we can drop player subscripts. Note that the set of strongly sym-metric equilibrium payoffs cannot be any larger than [0,2], since[0,2] is the largest set of feasible symmetric payoffs.

Two preliminary calculations (important to note that these pre-liminary calculations make no assumptions about [0,2] being a setof eq payoffs):

1. Let W EE be the set of player 1 payoffs that could be decom-posed on [0,2] using EE (i.e.,W EE is the set of player 1 payoffsthat could enforceably achieved by EE followed by appropri-ate symmetric continuations in [0,2]). Then v ∈W EE iff

v =2(1− δ)+ δγ(EE)≥3(1− δ)+ δγ(SE),


for some γ(EE), γ(SE) ∈ [0,2]. The largest value for γ(EE)is 2, while the incentive constraint implies the smallest is (1−δ)/δ, so that W EE = [3(1 − δ), 2]. See Figure 7.4.1 for anillustration.

2. Let W SS be the set of player 1 payoffs that could be decom-posed on [0,2] using SS. Then v ∈W SS iff

v =0× (1− δ)+ δγ(SS)≥(−1)(1− δ)+ δγ(ES),

for some γ(SS), γ(ES) ∈ [0,2]. Since the inequality is satis-fied by setting γ(SS) = γ(ES), the largest value for γ(SS) is2, while the smallest is 0, and soW SS = [0, 2δ].

Observe that

[0,2] ⊃W SS ∪W EE = [0,2δ]∪ [3(1− δ),2].

Lemma 7.4.1 (Necessity). Suppose [0,2] is the set of strongly sym-metric strategy equilibrium payoffs. Then,

[0,2] ⊂W SS ∪W EE.

Proof. Suppose v is the payoff of some strongly symmetric strategyequilibrium s. Then either s0 = EE or SS. Since the continuationequilibrium payoffs must lie in [0,2], we immediately have that ifs0 = EE, then v ∈ W EE, while if s0 = SS, then v ∈ W SS . But thisimplies v ∈W SS∪W EE. So, if [0,2] is the set of strongly symmetricstrategy equilibrium payoffs, we must have

[0,2] ⊂W SS ∪W EE.

So, when is[0,2] ⊂W SS ∪W EE?

This holds iff 2δ ≥ 3(1− δ) (i.e., δ ≥ 35 ).


v

3− 3δ

2δ 2

γ

1−δδ

2

W SS

v = δγSS

v = 2(1− δ)+ δγEE

v0

0

1

2

3

W EE

Figure 7.4.1: An illustration of the folk theorem. The continuations thatenforce EE are labelled γEE , while those that enforce SS arelabelled γSS . The value v0 is the average discounted valueof the equilibrium whose current value/continuation valueis described by one period of EE, followed by the cycle 1 −2 − 3 − 1. In this cycle, play follows EE, (EE, EE, SS)∞. TheFigure was drawn for δ = 2

3 ; v0 = 9857 .

Many choices of v0 will not lead to a cycle.


Lemma 7.4.2 (Sufficiency). If

[0,2] =W SS ∪W EE,

then [0,2] is the set of strongly symmetric strategy equilibrium pay-offs.

Proof. Fix v ∈ [0,2], and define a recursion as follows: set γ0 = v ,and

γ t+1 =γ t/δ if γ t ∈W SS = [0, 2δ], and

(γ t − 2(1− δ))/δ if γ t ∈W EE \W SS = (2δ, 2].

Since [0,2] ⊂ W SS ∪W EE, this recursive definition is well definedfor all t. Moreover, since δ ≥ 3

5 , γt ∈ [0,2] for all t. The recursionthus yields a bounded sequence of continuations γ tt. Associatedwith this sequence of continuations is the outcome path att:

a t =EE if γ t ∈W EE \W SS , and

SS if γ t ∈W SS .

Observe that, by construction,

γ t = (1− δ)ui(at)+ δγ t+1.

Consider the automaton (W ,w0, f , τ) where

• W = [0,2];• w0 = v ;

• the output function is

f(w) =EE if w ∈W EE \W SS , and

SS if w ∈W SS , and

• the transition function is

τ(w,a) =

(w − 2(1− δ))/δ if w ∈W EE \W SS and a = f(w),w/δ if w ∈W SS and a = f(w), and

0, if a ≠ f(w).


The outcome path implied by this strategy profile is att. More-over,

v = γ0 =(1− δ)ui(a0)+ δγ1

=(1− δ)ui(a0)+ δ (1− δ)ui(a1)+ δγ2

=(1− δ)∑T−1t=0 δtui(at)+ δTγT

=(1− δ)∑∞t=0 δtui(at)(where the last equality is an implication of δ < 1 and the sequenceγTT being bounded). Thus, the payoff of this outcome path isexactly v , that is, v is the payoff of the strategy profile describedby the automaton (W ,w0, f , τ) with initial state w0 = v .

Thus, there is no profitable one-deviation from this automaton(this is guaranteed by the constructions of W SS and W EE). Conse-quently the associated strategy profile is subgame perfect.

See Mailath and Samuelson (2006, §2.5) for much more on this.«

7.5 Imperfect Public Monitoring

7.5.1 Efficiency Wages II

A slight modification of the example from Section 7.3.1.6 As before,in the stage game, the worker decides whether to exert effort (E)or to shirk (S) for the firm (player II). Effort has a disutility of eand yields output y for sure, while shirking yields output y withprobability p, and output 0 with probability 1−p. The firm choosesa wage w∈ R+. At the end of the period, the firm does not observeeffort, but does observe output.

Supposey − e > py

so it is efficient for the worker to exert effort.The payoffs are described in Figure 7.5.1.Consider the profile described by the automaton illustrated in

Figure 7.5.2.6This is also similar to example in Gibbons (1992, Section 2.3.D), but the firm

also faces an intertemporal trade-off.


I

E

w ∈ [0, y]

(w− e,y −w)

S

Nature

[p] [1− p]

II

w ∈ [0, y]

(w, y −w)

w ∈ [0, y]

(w,−w)

Figure 7.5.1: The extensive form and payoffs of the stage game for thegame in Section 7.5.1.

wEw∗w0 wS0(y,w) : w < w∗ or (0,w)

(y,w) : w ≥ w∗

Figure 7.5.2: The automaton for the strategy profile in the repeated gamein Section 7.5.1. The transition from the state wEw∗ labelled(y,w) : w ≥ w∗means any signal profile in which output isobserved and the firm paid at least w∗; the other transitionfrom wEw∗ occurs if either the firm underpaid (w < w∗), orno output is observed (0).


The value functions are

V1(wS0) = 0, V1(wEw∗) = w∗ − e,

V2(wS0) = py, V2(wEw∗) = y −w∗,

In the absorbing state wS0, play is the unique eq of the stagegame, and so incentives are trivially satisfied.

The worker does not wish to deviate in H if

V1(wEw∗) ≥ (1− δ)w∗ + δpV1(wEw∗)+ (1− p)× 0,i.e.,

δ(1− p)w∗ ≥ (1− δp)eor

w∗ ≥ 1− δpδ(1− p)e =

eδ+ p(1− δ)δ(1− p)e.

To understand the role of the imperfect monitoring, comparethis with the analogous constraint when the monitoring is perfect(7.3.1), which requires w∗ ≥ e/δ..

The firm does not wish to deviate in wEw∗ if

V2(wEw∗) ≥ (1− δ)y + δpy,i.e.,

y −w∗ ≥ (1− δ)y + δpy a δ(1− p)y ≥ w∗.

So, the profile is an “equilibrium” if

δ(1− p)y ≥ w∗ ≥ 1− δpδ(1− p)e.

In fact, it is implication of the next section that the profile is aperfect public equilibrium.

7.5.2 Basic Structure

As before, action space for i is Ai, with typical action ai ∈ Ai. Anaction profile is denoted a = (a1, . . . , an).


At the end of each period, rather than observing a, all playersobserve a public signal y taking values in some space Y accordingto the distribution Pry| (a1, . . . , an) ≡ ρ(y| a).

Since the signal y is a possibly noisy signal of the action profilea in that period, the actions are imperfectly monitored by the otherplayers. Since the signal is public (and so observed by all players),the game is said to have public monitoring.

Assume Y is finite.u∗i : Ai × Y → R, i’s ex post or realized payoff.Stage game (ex ante) payoffs:

ui(a) ≡∑y∈Y

u∗i (ai, y)ρ(y| a).

Public histories:H ≡ ∪∞t=0Y t,

with ht ≡ (y0, . . . , y t−1) being a t period history of public signals(Y 0 ≡ ∅).

Public strategies:si : H → Ai.

Definition 7.5.1. A perfect public equilibrium is a profile of publicstrategies s that, after observing any public history ht, specifies aNash equilibrium for the repeated game, i.e., for all t and all ht ∈ Y t,s|ht is a Nash equilibrium.

If ρ(y|a) > 0 for all y and a, every public history arises with

positive probability, and so every Nash equilibrium in public strate-gies is a perfect public equilibrium.

Automaton representation of public strategies: (W ,w0, f , τ),where

• W is set of states,

• w0 is initial state,

• f :W → A is output function (decision rule), and

• τ :W × Y →W is transition function.


y¯y

E (3−p−2q)(p−q) − (p+2q)

(p−q)S 3(1−r)

(q−r) − 3r(q−r)

Figure 7.5.3: The ex post payoffs for the imperfect public monitoring ver-sion of the prisoners’ dilemma from Figure 7.1.1.

As before, Vi(w) is i’s value of being in state w.

Lemma 7.5.1. Suppose the strategy profile s is represented by (W ,w0, f , τ).Then s is a perfect public eq if, and only if, for all w ∈W (satisfyingw = τ(w0, ht) for some ht ∈ H), f(w) is a Nash eq of the normalform game with payoff function gw : A→ Rn, where

gwi (a) = (1− δ)ui(a)+ δ∑yVi(τ(w,y))ρ(y| a).

See Problem 7.6.13 for the proof (another instance of the one-shot deviation principle). .

Example 7.5.1 (PD as a partnership). Effort determines output ¯y, y

stochastically:

Pry| a ≡ ρ(y| a) =

p, if a = EE,

q, if a = SE or ES,

r , if a = SS,

where 0 < q < p < 1 and 0 < r < p.The ex post payoffs (u∗i ) are given in Figure 7.5.3, so that ex ante

payoffs (ui) are those given in Figure 7.1.1. «

Example 7.5.2 (One period memory). Two state automaton: W =wEE,wSS, w0 = wEE, f(wEE) = EE, f(wSS) = SS, and

τ(w,y) =wEE, if y = y,wSS , if y =

¯y.


wEEw0 wSS

¯y

y

y

¯y

Value functions (I can drop player subscripts by symmetry):

V(wEE) = (1− δ) · 2+ δpV(wEE)+ (1− p)V(wSS)and

V(wSS) = (1− δ) · 0+ δrV(wEE)+ (1− r)V(wSS).This is eq if

V(wEE) ≥ (1− δ) · 3+ δqV(wEE)+ (1− q)V(wSS)and

V(wSS) ≥ (1− δ) · (−1)+ δqV(wEE)+ (1− q)V(wSS).Rewriting the incentive constraint at wEE,

(1− δ) · 2+ δpV(wEE)+ (1− p)V(wSS)≥ (1− δ) · 3+ δqV(wEE)+ (1− q)V(wSS)

orδ(p − q)V(wEE)− V(wSS) ≥ (1− δ).

We can obtain an expression for V(wEE)− V(wSS) without solv-ing for the value functions separately by differencing the value re-cursion equations, yielding

V(wEE)− V(wSS) =(1− δ) · 2+ δpV(wEE)+ (1− p)V(wSS)− δrV(wEE)+ (1− r)V(wSS)

=(1− δ) · 2+ δ(p − r)V(wEE)− V(wSS),so that

V(wEE)− V(wSS) = 2(1− δ)1− δ(p − r),


and so

δ ≥ 13p − 2q − r . (7.5.1)

Turning to wSS , we have

δrV(wEE)+ (1− r)V(wSS)≥ (1− δ) · (−1)+ δqV(wEE)+ (1− q)V(wSS)

or(1− δ) ≥ δ(q − r)V(wEE)− V(wSS),

requiring

δ ≤ 1p + 2q − 3r

. (7.5.2)

Note that (7.5.2) is trivially satisfied if r ≥ q (make sure youunderstand why this is intuitive).

The two bounds (7.5.1) and (7.5.2) on δ are consistent if

p ≥ 2q − r .The constraint δ ∈ (0,1) in addition requires

3r − p < 2q < 3p − r − 1.

Solving for the value functions, V(wEE)V(wSS)

=(1− δ)

1− δp −δ(1− p)−δr 1− δ(1− r)

−1 2

0

= (1− δ)(1− δp) (1− δ(1− r))− δ2

(1− p) r×

1− δ(1− r) δ (1− p)

δr 1− δp

2

0

= (1− δ)(1− δ) (1− δ (p − r))

2 (1− δ(1− r))

2δr

= 11− δ (p − r)

2 (1− δ(1− r))

2δr

.


Moreover, for fixed p and r ,

limδ→1V(wEE) = lim

δ→1V(wSS) = 2r

1− p + r ,

and, for r > 0,limp→1

limδ→1

V(wEE) = 2.

In contrast, grim trigger (where one realization of¯y results in

permanent SS) has a limiting (as δ → 1) payoff of 0 (see Problem7.6.14(a)). Higher payoffs can be achieved by considering moreforgiving versions, such as in Problem 7.6.14(b). Intuitively, as δgets large, the degree of forgiveness should also increase. Nonethe-less, all strongly symmetric public perfect equilibria have payoffsbounded above by

2− (1− p)(p − q).

Problem 7.6.17 asks you to prove this (with the aid of some gener-ous hints). «

Remark 7.5.1. The notion of PPE only imposes ex ante incentiveconstraints. If the stage game has a non-trivial dynamic structure,such as Problem 7.6.16, then it is natural to impose additional in-centive constraints.

7.6 Problems

7.6.1. Suppose G ≡ (Ai, ui) is an n-person normal form game and GTis its T -fold repetition (with payoffs evaluated as the average). LetA ≡ ∏iAi. The strategy profile, s, is history independent if for all iand all 1 ≤ t ≤ T − 1, si(ht) is independent of ht ∈ At (i.e., si(ht) =si(ht) for all ht, ht ∈ At). Let N(1) be the set of Nash equilibriaof G. Suppose s is history independent. Prove that s is a subgameperfect equilibrium if and only if s(ht) ∈ N(1) for all t, 0 ≤ t ≤ T−1


and all ht ∈ At (s(h0) is of course simply s0). Provide examples toshow that the assumption of history independence is needed in bothdirections.

7.6.2. Prove the infinitely repeated game with stage game given by match-ing pennies does not have a pure strategy Nash equilibrium for anyδ.

7.6.3. Suppose (W ,w0, f , τ) is a (pure strategy representing) finite au-tomaton with |W| = K. Label the states from 1 to K, so thatW = 1,2, . . . , K, f : 1,2, . . . , K → A, and τ : 1,2, . . . , K × A →1,2, . . . , K. Consider the function Φ : RK → RK given by Φ(v) =(Φ1(v),Φ2(v), . . . ,ΦK(v)), where

Φk(v) = (1− δ)ui(f (k))+ δvτ(k,f (k)), k = 1, . . . , K.

(a) Prove that Φ has a unique fixed point. [Hint: Show that Φ is acontraction.]

(b) Given an explicit equation for the fixed point of Φ.

(c) Interpret the fixed point.

7.6.4. A different (and perhaps more enlightening) proof of Theorem 7.1.3is the following:

Suppose W and Ai are finite. Let Vi(w) be player i’s payoff fromthe best response to (W ,w, f−i, τ) (i.e., the strategy profile for theother players specified by the automaton with initial state w). (Thebest response is well-defined; you need not prove this.)

(a) Prove that

Vi(w) = maxai∈Ai

(1− δ)ui(ai, f−i(w))+ δVi(τ(w, (ai, f−i(w))))

.

(b) Note that Vi(w) ≥ Vi(w) for all w. Denote by wi, the state thatmaximizes Vi(w)−Vi(w) (if there is more than one, choose onearbitrarily).

Let awi be the action solving the maximization in 7.6.4(a), anddefine

V†i (w) := (1− δ)ui(awi , f−i(w))+ δVi(τ(w, (awi , f−i(w)))).


E S

E 1,1 − `,1+ gS 1+ g,−` 0,0

Figure 7.6.1: A general prisoners’ dilemma, where ` > 0 and g > 0.

Prove that if the strategy profile with representing automaton(W ,w0, f , τ) is not subgame perfect, then there exists a playeri for which

V†i (wi) > Vi(wi).

Relate this inequality to Theorem 7.1.3.

(c) Extend the argument to infiniteW , compact Ai, and continuousui.

7.6.5. Prove that the profile described in Example 7.1.4 is not a Nash equi-librium if δ ∈ [1/3, 1/2). [Hint: what is the payoff from alwaysplaying A?] Prove that it is Nash if δ ∈ [1/2, 1).

7.6.6. Suppose two players play the infinitely repeated prisoners’ dilemmadisplayed in Figure 7.6.1.

(a) For what values of the discount factor δ is grim trigger a sub-game perfect equilibrium?

(b) Describe a simple automaton representation of the behavior inwhich player I alternates between E and S (beginning with E),player II always plays E, and any deviation results in permanentSS. For what parameter restrictions is this a subgame perfectequilibrium?

(c) For what parameter values of `, g, and δ is tit-for-tat a subgameperfect equilibrium?

7.6.7. Suppose the game in Figure 7.6.2 is infinitely repeated: Let δ de-note the common discount factor for both players and consider thestrategy profile that induces the outcome path DL,UR,DL,UR, · · · ,and that, after any unilateral deviation by the row player speci-fies the outcome path DL,UR,DL,UR, · · · , and after any unilat-eral deviation by the column player, specifies the outcome path


L R

U 2,2 x,0

D 0,5 1,1

Figure 7.6.2: The game for Problem 7.6.7.

UR,DL,UR,DL, · · · (simultaneous deviations are ignored. i.e., aretreated as if neither player had deviated).

(a) What is the simplest automaton that represents this strategyprofile?

(b) Suppose x = 5. For what values of δ is this strategy profilesubgame perfect?

(c) Suppose now x = 4. How does this change your answer to part7.6.7(b)?

(d) Suppose x = 5 again. How would the analysis in part 7.6.7(b)be changed if the column player were short-lived (lived for onlyone period)?

7.6.8. Fix a stage game G = (Ai, ui) and discount factor δ. Let Ep(δ) ⊂Fp∗ be the set of pure strategy subgame perfect equilibrium pay-offs.

(a) Prove that every payoff v ∈ Ep(δ) is decomposable on Ep(δ).(b) Suppose γ : A → Ep(δ) enforces the action profile a′. Describe

a subgame perfect equilibrium in which a′ is played in the firstperiod.

(c) Prove that every payoff v ∈ Fp∗ decomposable on Ep(δ) is inEp(δ).

7.6.9. Consider the prisoner’s dilemma from Figure 7.1.1. Suppose thegame is infinitely repeated with perfect monitoring. Recall that astrongly symmetric strategy profile (s1, s2) satisfies s1(ht) = s2(ht)for all ht . Equivalently, its automaton representation satisfies f1(w) =f2(w) for all w. Let W = δv,v, v > 0 to be determined, be theset of continuation promises. Describe a strongly symmetric strat-egy profile (equivalently, automaton) whose continuation promises


E S

E 1,2 −1,3

S 2,−4 0,0


L R

T 2,3 0,2

B 3,0 1,1

Figure 7.6.4: The game for Problem 7.6.12

come fromW which is a subgame perfect equilibrium for some val-ues of δ. Calculate the appropriate bounds on δ and the value of v(which may or may not depend on δ).

7.6.10. Describe the five state automaton that yields v0 as a strongly sym-metric equilibrium payoff with the indicated cycle in Figure 7.4.1.

7.6.11. Consider the (asymmetric) prisoners’ dilemma in Figure 7.6.3. Sup-pose the game is infinitely repeated with perfect monitoring. Provethat for δ < 1

2 , the maximum (average discounted) payoff to player1 in any pure strategy subgame perfect equilibrium is 0, while forδ = 1

2 , there are equilibria in which player 1 receives a payoff of 1.

[Hint: First prove that, if δ ≤ 12 , in any pure strategy subgame per-

fect equilibrium, in any period, if player 2 chooses E then player 1chooses E in that period.]

7.6.12. Consider the stage game in Figure 7.6.4, where player 1 is the rowplayer and 2, the column player (as usual).

(a) Suppose the game is infinitely repeated, with perfect monitor-ing. Players 1 and 2 are both long-lived, and have the samediscount factor, δ ∈ (0,1). Construct a three state automatonthat for large δ is a subgame perfect equilibrium, and yields apayoff to player 1 that is close to 21

2 . Prove that the automaton


has the desired properties. (Hint: One state is only used off thepath-of-play.)

(b) Now suppose that player 2 is short-lived (but maintain the as-sumption of perfect monitoring, so that the short-lived playerin period t knows the entire history of actions up to t). Provethat player 1’s payoff in any pure strategy subgame perfectequilibrium is no greater than 2 (the restriction to pure strategyis not needed—can you prove the result without that restric-tion?). For which values of δ is there a pure strategy subgameperfect equilibrium in which player 1 receives a payoff of pre-cisely 2?

7.6.13. Fix a repeated finite game of imperfect public monitoring (as usual,assume Y is finite). Say that a player has a profitable one-shot devi-ation from the public strategy (W ,w0, f , τ) if there is some historyht ∈ Y t and some action ai ∈ Ai such that (where w = τ(w0, ht))

Vi(w) < (1−δ)ui(ai, f−i(w))+δ∑yVi(τ(w,y))ρ(y | (ai, f−i(w))).

(a) Prove that a public strategy profile is a perfect public equilib-rium if and only if there are no profitable one-shot deviations.(This is yet another instance of the one-shot deviation princi-ple).

(b) Prove Lemma 7.5.1.

7.6.14. Consider the prisoners’ dilemma game in Example 7.5.1.

(a) The grim trigger profile is described by the automaton (W ,w0, f , τ),whereW = wEE,wSS, w0 = wEE , f(wa) = a, and

τ(w,y) =wEE, if w = wEE and y = y ,

wSS , otherwise.

For what parameter values is the grim-trigger profile an equi-librium?

(b) An example of a forgiving grim trigger profile is described bythe automaton (W , w0, f , τ), where W = wEE, w′EE, wSS, w0 =wEE , f (wa) = a, and

τ(w,y) =

wEE, if w = wEE or w′EE , and y = y ,

w′EE, if w = wEE and y =¯y ,

wSS , otherwise.


h `

H 4,3 0,2

L x,0 3,1


For what parameter values is this forgiving grim-trigger profilean equilibrium? Compare the payoffs of grim trigger and thisforgiving grim trigger when both are equilibria.

7.6.15. Player 1 (the row player) is a firm who can exert either high effort (H)or low effort (L) in the production of its output. Player 2 (the columnplayer) is a consumer who can buy either a high-priced product, h,or a low-priced product `. The actions are chosen simultaneously,and payoffs are given in Figure 7.6.5. Player 1 is infinitely lived, dis-counts the future with discount factor δ, and plays the above gamein every period with a different consumer (i.e., each consumer livesonly one period). The game is one of public monitoring: while theactions of the consumers are public, the actions of the firm are not.Both the high-priced and low-priced products are experience goodsof random quality, with the distribution over quality determined bythe effort choice. The consumer learns the quality of the product af-ter purchase (consumption). Denote by y the event that the productpurchased is high quality, and by

¯y the event that it is low qual-

ity (in other words, y ∈ ¯y, y is the quality signal). Assume the

distribution over quality is independent of the price of the product:

Pr(y | a) =p, if a1 = H,

q, if a1 = L,

with 0 < q < p < 1.

(a) Describe the ex post payoffs for the consumer. Why can the expost payoffs for the firm be taken to be the ex ante payoffs?

(b) Suppose x = 5. Describe a perfect public equilibrium in whichthe patient firm chooses H infinitely often with probability one,and verify that it is an equilibrium. [Hint: This can be done withone-period memory.]


(c) Suppose now x ≥ 8. Is the one-period memory strategy profilestill an equilibrium? If not, can you think of an equilibrium inwhich H is still chosen with positive probability?

7.6.16. A financial manager undertakes an infinite sequence of trades onbehalf of a client. Each trade takes one period. In each period, themanager can invest in one of a large number of risky assets. By ex-erting effort (a = E) in a period (at a cost of e > 0), the manager canidentify the most profitable risky asset for that period, which gen-erates a high return of R = H with probability p and a low returnR = L with probability 1 − p. In the absence of effort (a = S), themanager cannot distinguish between the different risky assets. Forsimplicity, assume the manager then chooses the wrong asset, yield-ing the low return R = L with probability 1; the cost of no effort is0. In each period, the client chooses the level of the fee x ∈ [0, x] tobe paid to the financial manager for that period. Note that there isan exogenous upper bound x on the fee that can be paid in a period.The client and financial manager are risk neutral, and so the client’spayoff in a period is

uc(x,R) = R − x,while the manager’s payoff in a period is

um(x,a) =x − e, if a = E,x, if a = S.

The client and manager have a common discount factor δ. The clientobserves the return on the asset prior to paying the fee, but does notobserve the manager’s effort choice.

(a) Suppose the client cannot sign a binding contract committinghim to pay a fee (contingent or not on the return). Describe theunique sequentially rational equilibrium when the client usesthe manager for a single transaction. Are there any other Nashequilibria?

(b) Continue to suppose there are no binding contracts, but nowconsider the case of an infinite sequence of trades. For a rangeof values for the parameters (δ, x, e, p, H, and L), there is aperfect public equilibrium in which the manager exerts efforton behalf of the client in every period. Describe it and therestrictions on parameters necessary and sufficient for it to bean equilibrium.


(c) Compare the fee paid in your answer to part 7.6.16(b) to the feethat would be paid by a client for a single transaction,

i. when the client can sign a legally binding commitment to afee schedule as a function of the return of that period, and

ii. when the client can sign a legally binding commitment to afee schedule as a function of effort.

(d) Redo question 7.6.16(b) assuming that the client’s choice of feelevel and the manager’s choice of effort are simultaneous, sothat the fee paid in period t cannot depend on the return inperiod t. Compare your answer with that to question 7.6.16(b).

7.6.17. In this question, we revisit the partnership game of Example 7.5.1.Suppose 3p − 2q > 1. This question asks you to prove that forsufficiently large δ, any payoff in the interval [0, v], is the payoff ofsome strongly symmetric PPE equilibrium, where

v = 2− (1− p)(p − q),

and that no payoff larger than v is the payoff of some strongly sym-metric PPE equilibrium. Strong symmetry implies it is enough tofocus on player 1, and the player subscript will often be omitted.

(a) The action profile SS is trivially enforced by any constant con-tinuation γ ∈ [0, γ] independent of y . Let W SS be the set ofvalues that can be obtained by SS and a constant continuationγ ∈ [0, γ], i.e.,

W SS = (1− δ)u1(SS)+ δγ : γ ∈ [0, γ] .Prove thatW SS = [0, δγ]. [This is almost immediate.]

(b) Recalling Definition 7.4.1, say that v is decomposed by EE on[0, γ] if there exists γy , γ¯

y ∈ [0, γ] such that

v =(1− δ)u1(EE)+ δpγy + (1− p)γ¯y (7.6.1)

≥(1− δ)u1(SE)+ δqγy + (1− q)γ¯y. (7.6.2)

(That is, EE is enforced by the continuation promises γy , γ¯y

and implies the value v .) LetWEE be the set of values that canbe decomposed by EE on [0, γ]. It is clear thatWEE = [γ′, γ′′],for some γ′ and γ′′. Calculate γ′ by using the smallest possiblechoices of γy and γ¯

y in the interval [0, γ] to enforce EE. (Thiswill involve having the inequality (7.6.2) holding with equality.)


(c) Similarly, give an expression for γ′′ (that will involve γ) by us-ing the largest possible choices of γy and γ¯

y in the interval[0, γ] to enforce EE. Argue that δγ < γ′′.

(d) As in Example 7.4.1, we would like all continuations in [0, γ] tobe themselves decomposable using continuations in [0, γ], i.e.,we would like

[0, γ] ⊂W SS ∪WEE.

Since δγ < γ′′, we then would like γ ≤ γ′′. Moreover, sincewe would like [0, γ] to be the largest such interval, we haveγ = γ′′. What is the relationship between γ′′ and v?

(e) For what values of δ do we have [0, γ] =W SS ∪WEE?

(f) Let (W ,w0, f , τ) be the automaton given by W = [0, v], w0 ∈[0, v],

f(w) =EE, if w ∈WEE,SS, otherwise,

and

τ(w,y) =γy(w), if w ∈WEE ,

w/δ, otherwise,

where γy(w) solves (7.6.1)–(7.6.2) for w = v and y = y,¯y . For

our purposes here, assume that V(w) = w, that is, the valueto a player of being in the automaton with initial state w isprecisely w. (From the argument of Lemma 7.4.2, this shouldbe intuitive.) Given this assumption, prove that the automatondescribes a PPE with value w0.


www.cemmap.ac.uk

Lecture Slides

Repeated Games and Reputations:The Basic Structure

George J. Mailath

University of Pennsylvaniaand

Australian National University

CEMMAP LecturesNovember 17-18, 2016

The slides and associated bibliographyare on my webpage

http://economics.sas.upenn.edu/∼gmailath

1 / 1

Introduction

The theory of repeated games provides a centralunderpinning for our understanding of social, political, andeconomic institutions, both formal and informal.A key ingredient in understanding institutions and otherlong run relationships is the role of

shared expectations about behavioral norms (culturalbeliefs), andsanctions in ensuring that people follow the “rules.”

Repeated games allow for a clean description of both themyopic incentives that agents have to not follow the rulesand, via appropriate specifications of future behavior (andso rewards and punishments), the incentives that detersuch opportunistic behavior.

2 / 1

Examples of Long-Run Relationshipsand Opportunistic Behavior

Buyer-seller.The seller selling an inferior good.

Employer and employees.Employees shirking on the job, employer reneging on implicitterms of employment.

A government and its citizens.Government expropriates (taxes) all profits from investments.

World Trade OrganizationImposing tariffs to protect a domestic industry.

CartelsA firm exceeding its share of the monopolistic output.

3 / 1

Two particularly interesting examples

1 Dispute Resolution.Ellickson (1994) presents evidence that neighbors in ShastaCounty, CA, resolve disputes arising from the damage createdby escaped cattle in ways that both ignore legal liability and aresupported by intertemporal incentives.

2 Traders selling goods on consignment.Grief (1994) documents how the Maghribi and Genoesemerchants deterred their agents from misreporting that goodswere damaged in transport, and so were worth less. These twocommunities of merchants did this differently, and in waysconsistent with the different cultural characteristics of thecommunities and repeated game analysis.

4 / 1

The Leading ExampleThe prisoners’ dilemma as a partnership

E S

E 2, 2 −1, 3

S 3,−1 0, 0

u2

u1

F∗

Each player can guarantee herself a payoff of 0.A payoff vector is individually rational if it gives each player atleast their guarantee.

F∗ is the set of feasible and individually rational payoffs.

5 / 1

The Leading ExampleThe prisoners’ dilemma as a partnership

E S

E 2, 2 −1, 3

S 3,−1 0, 0

u2

u1

F∗

Each player can guarantee herself a payoff of 0.A payoff vector is individually rational if it gives each player atleast their guarantee.

F∗ is the set of feasible and individually rational payoffs.In the static (one shot ) play, each player will play S,resulting in the inefficient SS outcome.

6 / 1

Intertemporal Incentives

Suppose the game is repeated (once), and payoffs areadded.

We “know” SS will be played in last period, sono intertemporal incentives.

Infinite horizon—relationship never ends.The infinite stream of payoffs (u0

i , u1i , u2

i , . . .) is evaluatedas the (average) discounted sum

∑

t≥0(1 − δ)δtut

i .

Individual i is indifferent between 0, 1, 0, . . . and δ, 0, 0 . . ..

The normalization (1 − δ) implies that repeated gamepayoffs are comparable to stage game payoffs.The infinite constant stream of 1 utils has a value of 1.

7 / 1

A strategy σi for individual i describes how that individualbehaves (at each point of time and after any possiblehistory).

A strategy profile σ = (σ1, . . . , σn) describes how everyonebehaves (at each point of...).

DefinitionThe profile σ∗ is a Nash equilibrium if for all individuals i , wheneveryone else is behaving according to σ∗

−i , then i is also willingto behave as described by σ∗

i .The profile σ∗ is a subgame perfect equilibrium if for all historiesof play, the behavior described (induced) by the profile is aNash equilibrium.

Useful to think of social norms as equilibria: sharedexpectations over behavior that provide appropriatesanctions to deter deviations.

8 / 1

Characterizing Equilibria

Difficult problem: many possible deviations after manydifferent histories.

But repeated games are recursive, and the one shotdeviation principle (from dynamic programming) holds.

Simple penal codes (Abreu, 1988): use i ’s worst eq topunish any (and all) deviation by i .

9 / 1

Prisoners’ DilemmaGrim Trigger

wEEw0 wSS¬EE

EE

This is an equilibrium if

(1 − δ) + δ = 1 ≥(1 − δ) × 2 + δ × 0

⇒ δ ≥ 12 .

Grim trigger is subgame perfect: always S is a Nash eq(because SS is an eq of the stage game and in wSS behavior ishistory independent).

10 / 1

The need for credibility of punishmentsThe Purchase Game

A buyer and seller:

Buy Don’t buy

High effort 2,3 0, 0

Low effort 3,2 0, 0

u2

u1

F∗

The seller can guarantee himself 0, while the buyer canguarantee herself 2.

11 / 1


A buyer and seller:

Buy Don’t buy


Low effort 3,2 0, 0

u2

u1

F∗

The seller can guarantee himself 0, while the buyer canguarantee herself 2.There is an equilibrium in which the seller always chooseslow effort and the buyer always buys.

12 / 1


A buyer and seller:

Buy Don’t buy


Low effort 3,2 0, 0

u2

u1

F∗

The seller can guarantee himself 0, while the buyer canguarantee herself 2.There is an equilibrium in which the seller always chooseslow effort and the buyer always buys.Is there a social norm in which the buyer threatens not tobuy unless the seller chooses high effort?

Need to provide incentives for the buyer to do so.13 / 1

Why the buyer is willing to punishSuppose, after the seller “cheats” the buyer by choosing loweffort, the buyer expects the seller to continue to choose loweffort until the buyer punishes him by not buying.

B D

H 2, 3 0, 0

L 3, 2 0, 0 wHBw0 wLD

L

D

H B

The seller chooses high effort as long as δ ≥ 12 .

The buyer is willing to punish as long as δ ≥ 23 .

14 / 1

Why the buyer is willing to punishSuppose, after the seller “cheats” the buyer by choosing loweffort, the buyer expects the seller to continue to choose loweffort until the buyer punishes him by not buying.

B D

H 2, 3 0, 0

L 3, 2 0, 0 wHBw0 wLD

L

D

H B

The seller chooses high effort as long as δ ≥ 12 .

The buyer is willing to punish as long as δ ≥ 23 .

This is a carrot and stick punishment (Abreu, 1986).

15 / 1

The Game with Perfect MonitoringAction space for i is Ai , with typical action ai ∈ Ai .

An action profile is denoted a = (a1, . . . , an), withassociated flow payoffs ui(a).

Infinite stream of payoffs (u0i , u1

i , u2i , . . .) is evaluated as the

(average) discounted sum∑

t≥0(1 − δ)δtut

i ,

where δ ∈ [0, 1) is the discount factor.

Perfect monitoring: At the end of each period, all playersobserve the action profile a chosen.

History to date t : ht ≡ (a0, . . . , at−1) ∈ At ≡ Ht ; H0 ≡ ∅.

Set of all possible histories: H ≡ ∪∞t=0Ht .

Strategy for player i is denoted si : H → Ai .

Set of all strategies for player i is Si .

16 / 1

Automaton Representation of Behavior

An automaton, is the tuple (W , w0, f , τ ), where

W is set of states,

w0 is initial state,

f : W → A is output function (decision rule), and

τ : W × A → W is transition function.

Any automaton (W , w0, f , τ ) induces a strategy profile. Define

τ(w , h t) := τ(τ(w , h t−1), a t−1).

The induced strategy s is given by s(∅) = f (w0) and

s(ht) = f (τ(w0, ht)), ∀ht ∈ H\∅.

Every profile can be represented by an automaton (set W = H).

17 / 1

Nash Equilibrium

DefinitionAn automaton is a Nash equilibrium if the strategy profile srepresented by the automaton is a Nash equilibrium.

18 / 1

Subgames and Continuation Play

Each history ht reaches (“indexes”) a distinct subgame.

Suppose s is represented by (W , w0, f , τ ). Recall that

τ(w0, ht) := τ(τ(w0, ht−1), at−1).

The continuation strategy profile after a history ht , s|ht isrepresented by the automaton (W , wt , f , τ ), where

wt := τ(w0, ht).

Grim Trigger after any ht = (EE)t :

wEEw0 wSS¬EE

EE

19 / 1

Subgames and Continuation Play

Each history ht reaches (“indexes”) a distinct subgame.

Suppose s is represented by (W , w0, f , τ ). Recall that

τ(w0, ht) := τ(τ(w0, ht−1), at−1).

The continuation strategy profile after a history ht , s|ht isrepresented by the automaton (W , wt , f , τ ), where

wt := τ(w0, ht).

Grim Trigger after ht with an S (equivalent to always SS):

wEE wSS w0¬EE

EE

20 / 1

Subgame Perfection

Definition

The state w ∈ W of an automaton (W , w0, f , τ ) is reachablefrom w0 if w = τ(w0, ht) for some history ht ∈ H. Denote theset of states reachable from w0 by W(w0).

Definition

The automaton (W , w0, f , τ ) is a subgame perfect equilibrium iffor all states w ∈ W(w0), the automaton (W , w , f , τ ) is a Nashequilibrium.

21 / 1

The automaton (W , w , f , τ ) induces the sequences

w0 := w , a0 := f (w0)

w1 := τ(w0, a0), a1 := f (w1),

w2 := τ(w1, a1), a2 := f (w2),

......

Given an automaton (W , w0, f , τ ), let Vi(w) be i ’s value frombeing in the state w ∈ W , i.e.,

Vi(w) = (1 − δ)ui(f (w0)) + δVi(τ(w0, f (w0)))

= (1 − δ)ui(a0) + δ(1 − δ)ui(a

1) + δVi(w2)

...

= (1 − δ)∑

tδtui(a

t).

22 / 1

Principle of No One-Shot DeviationDefinition

Player i has a profitable one-shot deviation from (W , w0, f , τ ), ifthere is a state w ∈ W(w0) and some action ai ∈ Ai such that

Vi(w) < (1 − δ)ui(ai , f−i(w)) + δVi(τ(w , (ai , f−i(w))).

23 / 1




TheoremAn automaton is subgame perfect iff there are no profitableone-shot deviations.

24 / 1




TheoremAn automaton is subgame perfect iff there are no profitableone-shot deviations.

Corollary

The automaton (W , w0, f , τ ) is subgame perfect iff, for allw ∈ W(w0), f (w) is a Nash eq of the normal form game withpayoff function gw : A → Rn, where

gwi (a) = (1 − δ)ui(a) + δVi(τ(w , a)).

25 / 1

SGP if No Profitable One-Shot DeviationsProof I

Let Vi(w) be player i ’s payoff from the best response to(W , w , f−i , τ ) (i.e., the strategy profile for the other playersspecified by the automaton with initial state w). Then

Vi(w) = maxai∈Ai

(1 − δ)ui(ai , f−i(w)) + δVi(τ(w , (ai , f−i(w))))

.

Note that Vi(w) ≥ Vi(w) for all w . Denote by wi , the statethat maximizes Vi(w) − Vi(w) (if there is more than one,choose one arbitrarily).

If (W , w0, f , τ )) is not SGP, then for some player i ,

Vi(wi) − Vi(wi) > 0.

26 / 1

SGP iff No Profitable One-Shot DeviationsProof II

Then, for all w ,

Vi(wi) − Vi(wi) > δ[Vi(w) − Vi(w)],

and so

Vi(wi) − Vi(wi)

>δ[Vi(τ(wi , (awii , f−i(wi)))) − Vi(τ(wi , (a

wii , f−i(wi))))]

27 / 1


Then, for all w ,


and so

Vi(wi) − Vi(wi)


wii , f−i(wi))))]

+ [(1 − δ)ui(awii , f−i(wi)) − (1 − δ)ui(a

wii , f−i(wi))]

28 / 1


Then, for all w ,


and so

Vi(wi) − Vi(wi)


wii , f−i(wi))))]

+[(1 − δ)ui(awii , f−i(wi)) − (1 − δ)ui(a

wii , f−i(wi))]

=Vi(wi)

−

(1 − δ)ui(awii , f−i(wi)) + δVi(τ(wi , (a

wii , f−i(wi))))

.

29 / 1


Then, for all w ,


and so

Vi(wi) − Vi(wi)


wii , f−i(wi))))]

+ [(1 − δ)ui(awii , f−i(wi)) − (1 − δ)ui(a

wii , f−i(wi))]

=Vi(wi)

−


wii , f−i(wi))))

.

Thus,


wii , f−i(wi)))) > Vi(wi),

that is, player i has a profitable one-shot deviation at wi .30 / 1

Enforceability and Decomposability

DefinitionAn action profile a′ ∈ A is enforced by the continuationpromises γ : A → Rn if a′ is a Nash eq of the normal form gamewith payoff function gγ : A → Rn, where

gγi (a) = (1 − δ)ui(a) + δγi(a).

31 / 1


DefinitionAn action profile a′ ∈ A is enforced by the continuationpromises γ : A → Rn if a′ is a Nash eq of the normal form gamewith payoff function gγ : A → Rn, where

gγi (a) = (1 − δ)ui(a) + δγi(a).

DefinitionA payoff v is decomposable on a set of payoffs V if there existsan action profile a′ enforced by some continuation promisesγ : A → V satisfying, for all i ,

vi = (1 − δ)ui(a′) + δγi(a

′).

32 / 1

The Purchase Game 1

Buy Don’t buy


Low effort 3,2 0, 0

Only LB can be enforced by constant continuationpromises, and so

only (3, 2) can be decomposed on a singleton set, and thatset is (3, 2).

33 / 1

The Purchase Game 2

Buy Don’t buy


Low effort 3,2 0, 0

Suppose V =(2δ, 3δ), (2, 3),and δ > 2

3 .

(2, 3) is decomposed on V by HB and promises

γ(a) =

(2, 3), if a1 = H,

(2δ, 3δ), if a1 = L.

(2δ, 3δ) is decomposed on V by LD and promises

γ(a) =

(2, 3), if a2 = D,

(2δ, 3δ), if a2 = B.

No one-shot deviation principle =⇒every payoff in V is a subgame perfect eq payoff.

34 / 1

The Purchase Game 3

Buy Don’t buy


Low effort 3,2 0, 0

Suppose V =(2δ, 3δ), (2, 3),and δ > 2

3 .

(3 − 3δ + 2δ2, 2 − 2δ + 3δ2) =: v† is decomposed on V byLB and the constant promises

γ(a) = (2δ, 3δ).

So, payoffs outside V can also be decomposed on V .

No one-shot deviation principle =⇒v† is a subgame perfect eq payoff.

35 / 1

Subgame Perfection redux

Let Ep(δ) ⊂ Fp∗ be the set of pure strategy subgame perfectequilibrium payoffs.

Theorem

A payoff v ∈ Rn is decomposable on Ep(δ) if, and only if,v ∈ Ep(δ).

36 / 1

Subgame Perfection redux

Let Ep(δ) ⊂ Fp∗ be the set of pure strategy subgame perfectequilibrium payoffs.

Theorem

A payoff v ∈ Rn is decomposable on Ep(δ) if, and only if,v ∈ Ep(δ).

TheoremSuppose every payoff v in some set V ⊂ Rn is decomposablewith respect to V . Then, V ⊂ Ep(δ).

Any set of payoffs with the property described above is said tobe self-generating.

37 / 1

The Purchase Game 4

Buy Don’t buy


Low effort 3,2 0, 0

u2

u1

Vv†

38 / 1

A Folk Theorem

Intertemporal incentives allow for efficient outcomes, butalso for inefficient outcomes, as well as crazy outcomes.

This is illustrated by the “Folk” Theorem, so called becauseresults of this type have been part of game theory folkloresince at least the late sixties.

The Discounted Folk Theorem (Fudenberg&Maskin 1986)Suppose v is a feasible and strictly individually rational vector ofpayoffs. If the individuals are sufficiently patient (there existsδ ∈ (0, 1) such that for all δ ∈ (δ, 1)), then there is a subgameperfect equilibrium with payoff v .

39 / 1

InterpretationWhile efficient payoffs are consistent with equilibrium, soare many other payoffs, and associated behaviors.(Consistent with experimental evidence.)Moreover, multiple equilibria are consistent with the samepayoff.The theorem does not justify restricting attention to efficientpayoffs.

40 / 1

InterpretationWhile efficient payoffs are consistent with equilibrium, soare many other payoffs, and associated behaviors.(Consistent with experimental evidence.)Moreover, multiple equilibria are consistent with the samepayoff.The theorem does not justify restricting attention to efficientpayoffs.

Nonetheless:

In many situations, understanding the potential scope ofequilibrium incentives helps us to understand possibleplausible behaviors.Understanding what it takes to achieve efficiency gives usimportant insights into the nature of equilibrium incentives.It is sometimes argued that the punishments imposed aretoo severe. But this does simplify the analysis.

41 / 1

What we learn from perfect monitoring

Multiplicity of equilibria is to be expected.This is necessary for repeated games to serve as a buildingblock for any theory of institutions.Selection of equilibrium can (should) be part of modelling.

In general, efficiency requires being able to reward andpunish individuals independently (this is the role of the fulldimensionality assumption).

Histories coordinate behavior to provide intertemporalincentives by punishing deviations. This requiresmonitoring (communication networks) and a future.

Intertemporal incentives require that individuals havesomething at stake: “Freedom’s just another word fornothin’ left to lose.”

42 / 1

Repeated Games and Reputations:Imperfect Public Monitoring

George J. Mailath





http://economics.sas.upenn.edu/∼gmailath

1 / 41

What we learned from perfect monitoring

Multiplicity of equilibria is to be expected.

In general, efficiency requires being able to reward andpunish individuals independently.


2 / 41

What we learned from perfect monitoring

Multiplicity of equilibria is to be expected.

In general, efficiency requires being able to reward andpunish individuals independently.


But suppose deviations are not observed?Suppose instead actions are only observed with noise.

3 / 41

Collusion in OligopolyPerfect Monitoring

In each period, firms i = 1, . . . , n simultaneously choosequantities qi .

Firm i profits

πi(q1, . . . , qn) = pqi − c(qi),

where p is market clearing price, and c(qi) is the cost of qi .

Suppose p = P(∑

i qi) and P is a strictly decreasingfunction of Q :=

∑i qi .

If firms are patient, there is a subgame perfect equilibriumin which the each firm sells Qm/n, where Qm is monopolyoutput, supported by the threat that any deviation results inperpetual Cournot (static Nash) competition.

4 / 41

Collusion in OligopolyImperfect Monitoring


Firm i profits

πi(q1, . . . , qn) = pqi − c(qi),


Suppose p = P(∑

i qi) and P is a strictly decreasingfunction of Q :=

∑i qi .

Suppose now q1, . . . , qn are not public, but the marketclearing price p still is (so each firm knows its profit).Nothing changes! A deviation is still necessarily detected,since the market clearing price changes.

5 / 41

Collusion in OligopolyNoisy Imperfect Monitoring–Green and Porter


Firm i profits

πi(q1, . . . , qn) = pqi − c(qi),


But suppose demand is random, so that the marketclearing price p is a function ofQ and a demand shock η.Moreover, suppose p has full support for all Q.

=⇒ no deviation is detected.

6 / 41

Repeated Games with Noisy Imperfect Monitoring

In a setting with noisy imperfect monitoring where it isimpossible to detect deviations, are there still intertemporalincentives?

If so, what is their nature?

And, how effective are these intemporal incentives?

7 / 41


In a setting with noisy imperfect monitoring where it isimpossible to detect deviations, are there still intertemporalincentives? Yes


And, how effective are these intemporal incentives?

8 / 41


In a setting with noisy imperfect monitoring where it isimpossible to detect deviations, are there still intertemporalincentives?


And, how effective are these intemporal incentives?Surprisingly strong!

9 / 41

Repeated Games with Imperfect Public MonitoringStructure 1

Action space for i is Ai , with typical action ai ∈ Ai .Profile a is not observed.All players observe a public signal y ∈ Y , |Y | < ∞, with

Pry | (a1, . . . , an) =: ρ(y | a).

Since y is a possibly noisy signal of the action profile a inthat period, the actions are imperfectly monitored.Since the signal is public (observed by all players), thegame is said to have public monitoring.Assume Y is finite.u∗

i : Ai × Y → R, i ’s ex post or realized payoff.Stage game (ex ante) payoffs:

ui(a) ≡∑

y∈Y

u∗i (ai , y)ρ(y | a).

10 / 41

Ex post payoffsOligopoly with imperfect monitoring

Ex post payoffs are given by realized profits,

u∗i (qi , p) = pqi − c(qi),

where p is the public signal.

Ex ante payoffs are given by expected profits,

ui(q1, . . . , qn) = E [pqi − c(qi) | q1, . . . qn]

= E [p | q1, . . . qn]qi − c(qi).

11 / 41

Ex post payoffs IIPrisoners’ Dilemma with Noisy Monitoring

There is a noisy signal of actions (output), y ∈ y , y =: Y ,

Pr(y | a) := ρ(y | a) =

p, if a = EE ,

q, if a = SE or SE , and

r , if a = SS.

Player i ’s ex post payoffs ex ante payoffs

y y

E (3−p−2q)(p−q) − (p+2q)

(p−q)

S 3(1−r)(q−r) − 3r

(q−r)

E S

E 2, 2 −1, 3

S 3,−1 0, 0

12 / 41

Repeated Games with Imperfect Public MonitoringStructure 2

Public histories:H ≡ ∪∞

t=0Y t ,

with ht ≡ (y0, . . . , yt−1) being a t period history of publicsignals (Y 0 ≡ ∅).

Public strategies:si : H → Ai .

13 / 41

Automaton Representation of Public StrategiesAn automaton, is the tuple (W , w0, f , τ ), where

W is set of states,

w0 is initial state,

f : W → A is output function (decision rule), and

τ : W × Y → W is transition function.

The automaton is strongly symmetric if fi(w) = fj(w) ∀i , j , w .

Any automaton (W , w0, f , τ ) induces a strategy profile. Define

τ(w , h t) := τ(τ(w , h t−1), y t−1).

The induced strategy s is given by s(∅) = f (w0) and

s(ht) = f (τ(w0, ht)), ∀ht ∈ H\∅.

Every public profile can be represented by an automaton (setW = H).

14 / 41

Prisoners’ Dilemma with Noisy MonitoringGrim Trigger

wEEw0 wSS

y

y

This is an eq if

V = (1 − δ)2 + δ[pV + (1 − p) × 0]

≥ (1 − δ)3 + δ[qV + (1 − q) × 0]

⇒2δ(p − q)

(1 − δp)≥ 1 ⇐⇒ δ ≥

13p − 2q

.

Note that V = 2(1−δ)(1−δp) , and so limδ→1 V = 0.

15 / 41

Equilibrium Notion

Game has no proper subgames, so how to usefully capturesequential rationality?

16 / 41

Equilibrium Notion

Game has no proper subgames, so how to usefully capturesequential rationality?

A public strategy for an individual ignores that individual’sprivate actions, so that behavior only depends on publicinformation. Every player has a public strategy bestresponse when all other players are playing publicstrategies.

Definition

The automaton (W , w0, f , τ ) is a perfect public equilibrium(PPE) if for all states w ∈ W(w0), the automaton (W , w , f , τ ) isa Nash equilibrium.

17 / 41

Principle of No One-Shot Deviation

Definition


Vi(w) < (1−δ)ui(ai , f−i(w))+δ∑

y

Vi(τ(w , y))ρ(y | (ai , f−i(w))).

18 / 41

Principle of No One-Shot Deviation

Definition


Vi(w) < (1−δ)ui(ai , f−i(w))+δ∑

y

Vi(τ(w , y))ρ(y | (ai , f−i(w))).

Theorem

The automaton (W , w0, f , τ ) is a PPE iff there are no profitableone-shot deviations, i.e, for all w ∈ W(w0), f (w) is a Nash eq ofthe normal form game with payoff function gw : A → Rn, where

gwi (a) = (1 − δ)ui(a) + δ

∑

y

Vi(τ(w , y))ρ(y | a).

19 / 41

Prisoners’ Dilemma with Noisy MonitoringBounded Recall

wEEw0 wSS

yy y

y

V (wEE) = (1 − δ)2 + δpV (wEE) + (1 − p)V (wSS)V (wSS) = δrV (wEE) + (1 − r)V (wSS)

V (wEE) > V (wSS), but V (wEE ) − V (WSS) → 0 as δ → 1.

At wEE , EE is a Nash eq of gwEE if δ ≥ (3p − 2q − r)−1.

At wSS, SS is a Nash eq of gwSS if δ ≤ (p + 2q − 3r)−1.

20 / 41

Characterizing PPE

A major conceptual breakthrough was to focus oncontinuation values in the description of equilibrium, ratherthan focusing on behavior directly.

This yields a more transparent description of incentives,and an information characterization of equilibrium payoffs.

The cost is that we know little about the details of behaviorunderlying most of the equilibria, and so have little sensewhich of these equilibria are plausible descriptions ofbehavior.

21 / 41


DefinitionAn action profile a′ ∈ A is enforced by the continuationpromises γ : Y → Rn if a′ is a Nash eq of the normal form gamewith payoff function gγ : A → Rn, where

gγi (a) = (1 − δ)ui(a) + δ

∑

y

γi(y)ρ(y | a).

22 / 41


DefinitionAn action profile a′ ∈ A is enforced by the continuationpromises γ : Y → Rn if a′ is a Nash eq of the normal form gamewith payoff function gγ : A → Rn, where

gγi (a) = (1 − δ)ui(a) + δ

∑

y

γi(y)ρ(y | a).

DefinitionA payoff v is decomposable on a set of payoffs V if there existsan action profile a′ enforced by some continuation promisesγ : Y → V satisfying, for all i ,

vi = (1 − δ)ui(a′) + δ

∑

y

γi(y)ρ(y | a′).

23 / 41

Characterizing PPEThe Role of Continuation Values

Let Ep(δ) ⊂ F∗ be the set of (pure strategy) PPE.If v ∈ Ep(δ), then there exists a′ ∈ A and γ : Y → Ep(δ) sothat, for all i ,

vi = (1 − δ)ui(a′) + δ

∑

yγi(y)ρ(y | a′)

≥ (1 − δ)ui(ai , a′−i) + δ

∑

yγi(y)ρ(y | ai , a′

−i) ∀ai ∈ Ai .

That is, v is decomposed on Ep(δ).

24 / 41

Characterizing PPEThe Role of Continuation Values

Let Ep(δ) ⊂ F∗ be the set of (pure strategy) PPE.If v ∈ Ep(δ), then there exists a′ ∈ A and γ : Y → Ep(δ) sothat, for all i ,

vi = (1 − δ)ui(a′) + δ

∑

yγi(y)ρ(y | a′)

≥ (1 − δ)ui(ai , a′−i) + δ

∑


−i) ∀ai ∈ Ai .

That is, v is decomposed on Ep(δ).

Theorem (Self-generation, APS 1990)

B ⊂ Ep(δ) if and only if for all v ∈ B, there exists a′ ∈ A andγ : Y → B so that, for all i ,

vi = (1 − δ)ui(a′) + δ

∑

yγi(y)ρ(y | a′)

≥ (1 − δ)ui(ai , a′−i) + δ

∑


−i) ∀ai ∈ Ai .

25 / 41

Decomposability

u(a)

γ(y1)

γ(y2)γ(y3)

E(δ)

E [γ(y) | a]

v

(1 − δ)

δ

26 / 41

Impact of Increased Precision

Let R be the |A| × |Y |-matrix, [R]ay := ρ(y | a).

(Y , ρ′) is a garbling of (Y , ρ) if there exists a stochasticmatrix Q such that

R′ = RQ.

That is, the “experiment” (Y , ρ′) is obtained from (Y , ρ) byfirst drawing y according to ρ, and then adding noise.

If W can be decomposed on W ′ under ρ′, then W can bedecomposed on the convex hull of W ′ under ρ. And so theset of PPE payoffs is weakly increasing as the monitoringbecomes more precise.

27 / 41

Bang-BangSuppose A is finite and the signals y are distributedabsolutely continuously with respect to Lebesgue measureon a subset of Rk . Every pure strategy eq payoff can beachieved by (W , w0, f , τ ) with the bang-bang property:

V (w) ∈ ext Ep(δ) ∀w 6= w0,

where ext Ep(δ) is the set of extreme points of Ep(δ).

(Green-Porter) If (W , w0, f , τ ) is strongly symmetric, thenext Ep(δ) = V , V, where V := min Ep(δ), V := max Ep(δ).

wqw0 wq

p ∈ Pp ∈ P p ∈ P

p ∈ P

28 / 41

Prisoners’ Dilemma with Noisy MonitoringThe value of “forgiveness” I

wEEw0 wEE wSSy

yy

y

This has a higher value than grim trigger, since permanentSS is triggered after two consecutive y .

But the limiting value (as δ → 1) is still zero. As playersbecome more patient, the future becomes more important,and smaller variations in continuation values suffice toenforce EE .

EE can be enforced by more forgiving specifications asδ → 1.

29 / 41

Prisoners’ Dilemma with Noisy MonitoringThe value of “forgiveness” II

wEEw0 wSS

y (1 − β)

(β)

y

Public correlating device: β.This is an eq if

V = (1 − δ)2 + δ(p + (1 − p)β)V

≥ (1 − δ)3 + δ(q + (1 − q)β)V

In the efficient eq (requires p > q and δ(3p − 2q) > 1),

β = δ(3p−2q)−1δ(3p−2q−1) and V = 2 − 1−p

p−q < 2.

30 / 41

Prisoners’ Dilemma with Noisy MonitoringThe value of “forgiveness” III

Public correlating device is not necessary: Every purestrategy strongly symmetric PPE has payoff no larger than

2 − 1−pp−q =: γ.

31 / 41



2 − 1−pp−q =: γ.

But the upper bound is achieved: For sufficiently large δ,both [0, γ] and (0, γ] are self-generating.

32 / 41



2 − 1−pp−q =: γ.


The use of payoff 0 is Nash reversion.

33 / 41



2 − 1−pp−q =: γ.


The use of payoff 0 is Nash reversion.

Forgiving grim trigger: the set W = 0 ∪ [γ, γ], where

γ := 2(1−δ)1−δp ,

is, for large δ, self-generating with all payoffs > 0decomposed using EE .

34 / 41

Implications

Providing intertemporal incentives requires imposingpunishments on the equilibrium path.

These punishments may generate inefficiencies, and thegreater the noise, the greater the inefficiency.

How to impose punishments without creating inefficiencies:transfer value rather than destroying it.

In PD example, impossible to distinguish ES from SE .

Efficiency requires the monitoring be statisticallysufficiently informative.

Other examples reveal the need for asymmetric/nonstationary behavior in symmetric stationaryenvironments.

35 / 41

Statistical Informative MonitoringRank Conditions

DefinitionThe profile α has individual full rank for player i if the|Ai | × |Y |-matrix Ri(α−i), with

[Ri(α−i)]ai y := ρ(y | aiα−i),

has full row rank.

36 / 41

Statistical Informative MonitoringRank Conditions

DefinitionThe profile α has individual full rank for player i if the|Ai | × |Y |-matrix Ri(α−i), with

[Ri(α−i)]ai y := ρ(y | aiα−i),

has full row rank.The profile α has pairwise full rank for players i and j if the(|Ai | + |Aj |) × |Y |-matrix

Rij(α) :=

Ri(α−i)

Rj(α−j)

has rank |Ai | + |Aj | − 1.

37 / 41

Another Folk Theorem

The Public Monitoring Folk Theorem (Fudenberg, Levine,and Maskin 1994)Suppose the set of feasible and individually rational payoffs hasnonempty interior, and that all action profiles satisfy pairwise fullrank for all players. Every strictly individually rational andfeasible payoff is a perfect public equilibrium payoff, providedplayers are patient enough.

38 / 41

Another Folk Theorem

The Public Monitoring Folk Theorem (Fudenberg, Levine,and Maskin 1994)Suppose the set of feasible and individually rational payoffs hasnonempty interior, and that all action profiles satisfy pairwise fullrank for all players. Every strictly individually rational andfeasible payoff is a perfect public equilibrium payoff, providedplayers are patient enough.

Pairwise full rank fails for our prisoners’ dilemma example(can be satisfied if there were three signals).

Also fails for Green Porter noisy oligopoly example, sincedistribution of the market clearing price only depends ontotal market quantity.

Folk theorem holds under weaker assumptions.

39 / 41

Role of Patience

The monitoring can be arbitrarily noisy, as long as itremains statistically informative.

But, the noisier the monitoring the more patient the playersmust be.

40 / 41

Role of Patience

The monitoring can be arbitrarily noisy, as long as itremains statistically informative.

But, the noisier the monitoring the more patient the playersmust be.

Suppose time is continuous, and decisions are taken atpoints Δ, 2Δ, 3Δ,. . . .

If r is continuous rate of time discounting, then δ = e−rΔ.As Δ → 0, δ → 1.

For games of perfect monitoring, high δ can be interpretedas Δ.But, this is problematic for games of imperfect monitoring:As Δ → 0, the monitoring becomes increasingly preciseover a fixed time interval.

41 / 41

Repeated Games and Reputations:Private Monitoring

George J. Mailath





http://economics.sas.upenn.edu/gmailath

1 / 42

Games with Private Monitoring

Intertemporal incentives arise when public historiescoordinate continuation play.Can intertemporal incentives be provided when themonitoring is private?Stigler (1964) suggested that that answer is often NO, andso collusion is not likely to be a problem when monitoringproblems are severe.

2 / 42

The Problem

Fix a strategy profile . Player i ’s strategy is sequentiallyrational if, after all private histories, the continuationstrategy is a best reply to the other players’ continuationstrategies (which depend on their private histories).That is, player i is best responding to the other players’behavior, given his beliefs over the private histories of theother players.While player i knows his/her beliefs, we typically do not.Most researchers thought this problem was intractable,

until Sekiguchi, in 1997, showed:

There exists an almost efficient eq for the PD withconditionally-independent almost-perfect private monitoring.

3 / 42

The Problem

Fix a strategy profile . Player i ’s strategy is sequentiallyrational if, after all private histories, the continuationstrategy is a best reply to the other players’ continuationstrategies (which depend on their private histories).That is, player i is best responding to the other players’behavior, given his beliefs over the private histories of theother players.While player i knows his/her beliefs, we typically do not.Most researchers thought this problem was intractable,until Sekiguchi, in 1997, showed:

There exists an almost efficient eq for the PD withconditionally-independent almost-perfect private monitoring.

4 / 42

Prisoners’ DilemmaConditionally Independent Private Monitoring

E S

E 2;2 1;3

S 3;1 0;0 wEw0 wSSj

Ej

Rather than observing the other player’s action for sure,player i observes a noisy signal: i(yi = aj) = 1 ".Grim trigger is not an equilibrium: at the end of the firstperiod, it is not optimal for player i to play S after observingyi = Sj (since in eq, player j played E and so with highprob, observed yj = Ei ).Sekiguchi (1997) got around this by having playersrandomize (we will see how later).

5 / 42

Almost Public Monitoring

How robust are PPE in the game with public monitoring tothe introduction of a little private monitoring?Perturb the public signal, so that player i observes theconditionally (on y ) independent signal yi 2 fy ; yg, withprobabilities given by

(y1; y2 j y) = 1(y1 j y)2(y2 j y);

and

i(yi j y) =

(1 "; if yi = y ,"; if yi 6= y :

Ex post payoffs are now ui (ai ; yi).

6 / 42

Prisoners’ Dilemma with Noisy MonitoringBounded Recall-public monitoring

wEEw0 wSS

yy y

y

Suppose (3p 2q r)1 < < (p + 2q 3r)1, so profileis strict PPE in game with public monitoring.Vi(w) is i ’s value from being in public state w .

7 / 42

Prisoners’ Dilemma with Noisy MonitoringBounded Recall-private (almost-public) monitoring

wEw0 wS

yi

y iy

i

y i

In period t , player i ’s continuation strategy after privatehistory ht

i = (a0i ;a

1i ; : : : ;a

t1i ) is completely determined hi

i ’s private state w ti 2 W.

In period t , j sees private history htj , and forms belief

j(htj ) 2 W over the period t state of player i .

8 / 42

Prisoners’ Dilemma with Noisy MonitoringBounded Recall-Best Replies

wEw0 wS

yi

y iy

i

y i

For all y , Pr(yi 6= yj j y) = 2"(1 "), and so

Pr(w tj = w t

i (hti ) j ht 0

i ) = 2"(1 ") 8t 0 t :

For " sufficiently small, incentives from public monitoringcarry over to game with almost public monitoring, andprofile is an equilibrium.

9 / 42

Prisoners’ Dilemma with Noisy MonitoringGrim Trigger

Suppose 12pq < < 1, so grim trigger is a strict PPE.

Strategy in game with private monitoring is

wEw0 wS

yi

y i

If 1 > p > q > r > 0, profile is not a Nash eq (for any" > 0).If 1 > p > r > q > 0, profile is a Nash eq (but notsequentially rational).

10 / 42

Prisoners’ Dilemma with Noisy MonitoringGrim Trigger, 1 > p > q > r > 0

Consider private history ht1 = (Ey

1;Sy1;Sy1; ;Sy1):

Associated beliefs of 1 about w t2:

Pr(w02 = wE) = 1;

Pr(w12 = wS j Ey

1) = Pr(y1

2 = y2j Ey

1;w0

2 = wE) 1 " < 1;

but

Pr(w t2 = wS j ht

1)

= Pr(w t2 = wS j w t1

2 = wS)Pr(w t12 = wS j ht

1)

+ Pr(y t2 = y j w t1

2 = wE ;ht1)| z

0

Pr(w t12 = wE j ht

1);

and Pr(w t12 = wS j ht

1) < Pr(w t12 = wS j ht1

1 ), and soPr(w t

2 = wS j ht1)! 0, as t !1 .

11 / 42

Prisoners’ Dilemma with Noisy MonitoringGrim Trigger, 1 > p > r > q > 0


1;Sy1;Sy1; ;Sy1):


Pr(w02 = wE) = 1;

Pr(w12 = wS j Ey

1) = Pr(y1

2 = y2j Ey

1;w0

2 = wE) 1 " < 1;

but

Pr(w t2 = wS j ht

1)



1)


2 = wE ;ht1)| z

0

Pr(w t12 = wE j ht

1);


1) > Pr(w t12 = wS j ht1

1 ), and soPr(w t

2 = wS j ht1) 1 for all t .

12 / 42

Prisoners’ Dilemma with Noisy MonitoringGrim Trigger, 1 > p > r > q > 0


1;Ey1;Ey1; ;Ey1):


Pr(w02 = wE) = 1;

Pr(w12 = wS j Ey

1) = Pr(y1

2 = y2j Ey

1;w0

2 = wE) 1 " < 1;

but

Pr(w t2 = wS j ht

1)



1)


2 = wE ;ht1)| z

0

Pr(w t12 = wE j ht

1);


1) < Pr(w t12 = wS j ht1

1 ), and soPr(w t

2 = wS j ht1)! 0, as t !1 .

13 / 42

Automaton Representation of Strategies

An automaton, is the tuple (Wi ; w0i ; fi ; i), where

Wi is set of states,w0

i is initial state,fi :W ! Ai is output function (decision rule), andi :Wi Ai Yi !Wi is transition function.

Any automaton (Wi ;w0i ; fi ; i) induces a strategy for i . Define

i(wi ;hti ) := i(i(wi ;ht1

i );at1i ; y t1

i ):

The induced strategy si is given by si(?) = fi(w0i ) and

si(hti ) = fi(i(w0

i ;hti )); 8ht

i :

Every strategy can be represented by an automaton.

14 / 42

Almost Public Monitoring Games

Fix a game with imperfect full support public monitoring, sothat for all y 2 Y and a 2 A, (y j a) > 0.Rather than observing the public signal directly, eachplayer i observes a private signal yi 2 Y .The game with private monitoring is "-close to the gamewith public monitoring if the joint distribution on theprivate signal profile (y1; : : : ; yn) satisfies

j((y ; y ; : : : ; y) j a) (y j a)j < ":

Such a game has almost public monitoring.Any automaton in the game with public monitoringdescribes a strategy profile in all "-close almost publicmonitoring games.

15 / 42

Behavioral Robustness

DefinitionAn eq of a game with public monitoring is behaviorally robust ifthe same automaton is an eq in all "-close games to the gamewith public monitoring for " sufficiently small.

DefinitionA public automaton (W;w0; f ; ) has bounded recall if thereexists L such that after any history of length at least L,continuation play only depends on the last L periods of thepublic history (i.e., (w ;hL) = (w 0;hL) for all w ;w 0 2 W).

16 / 42

Behavioral RobustnessAn eq is behaviorally robust if the same profile is an eq innear-by games.A public profile has bounded recall if there exists L such thatafter any history of length at least L, continuation play onlydepends on the last L periods of the public history.

Theorem (Mailath and Morris, 2002)A strict PPE with bounded recall is behaviorally robust toprivate monitoring that is almost public.

Theorem (Mailath and Morris, 2006)If the private monitoring is sufficiently rich, a strict PPE isbehaviorally robust to private monitoring that is almost public ifand only if it has bounded recall.

17 / 42

Behavioral RobustnessAn eq is behaviorally robust if the same profile is an eq innear-by games.A public profile has bounded recall if there exists L such thatafter any history of length at least L, continuation play onlydepends on the last L periods of the public history.

Theorem (Mailath and Morris, 2002)A strict PPE with bounded recall is behaviorally robust toprivate monitoring that is almost public.

Theorem (Mailath and Morris, 2006)If the private monitoring is sufficiently rich, a strict PPE isbehaviorally robust to private monitoring that is almost public ifand only if it has bounded recall.

18 / 42

Bounded RecallIt is tempting to think that bounded recall provides an attractiverestriction on behavior. But:

Folk Theorem II (Horner and Olszewski, 2009)The public monitoring folk theorem holds using bounded recallstrategies. The folk theorem also holds using bounded recallstrategies for games with almost-public monitoring.

This private monitoring folk theorem is not behaviorallyrobust.

Folk Theorem III (Mailath and Olszewski, 2011)The perfect monitoring folk theorem holds using bounded recallstrategies with uniformly strict incentives. Moreover, theresulting equilibrium is behaviorally robust to almost-perfectalmost-public monitoring.

19 / 42

Bounded RecallIt is tempting to think that bounded recall provides an attractiverestriction on behavior. But:

Folk Theorem II (Horner and Olszewski, 2009)The public monitoring folk theorem holds using bounded recallstrategies. The folk theorem also holds using bounded recallstrategies for games with almost-public monitoring.

This private monitoring folk theorem is not behaviorallyrobust.

Folk Theorem III (Mailath and Olszewski, 2011)The perfect monitoring folk theorem holds using bounded recallstrategies with uniformly strict incentives. Moreover, theresulting equilibrium is behaviorally robust to almost-perfectalmost-public monitoring.

20 / 42

Prisoners’ DilemmaConditionally Independent Private Monitoring

E S

E 2;2 1;3

S 3;1 0;0 wE wSsj

ej

Player i observes a noisy signal: i(yi = aj) = 1 ".

TheoremFor all 0 > 0, there exists 00 > 0 > 0 such that for all 2 (1=3 + 0;1=3 + 00), there is a Nash equilibrium in whicheach player randomizing over the initial state, with theprobability on wE exceeding 0.

21 / 42

Proof and then Efficiency

Proof of lemmaOptimality of following grim trigger on different histories:

Es: updating given original randomization =) S optimal.

Ee;Ee; : : : ;Ee: perpetual e reassures i that j is still in wE .Ee;Ee; : : : ;Ee;Es. Most likely events: either j is still in wEand s is a mistake, or j received an erroneous signal in theprevious period. Odds slightly favor j receiving theerronous signal, and because low, S is optimal.Ee;Ee; : : : ;Ee;Es;Se; : : : ;Se. This period’s S will triggerj ’s switch to wS, if not there already.

22 / 42



Es: updating given original randomization =) S optimal.Ee;Ee; : : : ;Ee: perpetual e reassures i that j is still in wE .

Ee;Ee; : : : ;Ee;Es. Most likely events: either j is still in wEand s is a mistake, or j received an erroneous signal in theprevious period. Odds slightly favor j receiving theerronous signal, and because low, S is optimal.Ee;Ee; : : : ;Ee;Es;Se; : : : ;Se. This period’s S will triggerj ’s switch to wS, if not there already.

23 / 42



Es: updating given original randomization =) S optimal.Ee;Ee; : : : ;Ee: perpetual e reassures i that j is still in wE .Ee;Ee; : : : ;Ee;Es. Most likely events: either j is still in wEand s is a mistake, or j received an erroneous signal in theprevious period. Odds slightly favor j receiving theerronous signal, and because low, S is optimal.

Ee;Ee; : : : ;Ee;Es;Se; : : : ;Se. This period’s S will triggerj ’s switch to wS, if not there already.

24 / 42



Es: updating given original randomization =) S optimal.Ee;Ee; : : : ;Ee: perpetual e reassures i that j is still in wE .Ee;Ee; : : : ;Ee;Es. Most likely events: either j is still in wEand s is a mistake, or j received an erroneous signal in theprevious period. Odds slightly favor j receiving theerronous signal, and because low, S is optimal.Ee;Ee; : : : ;Ee;Es;Se; : : : ;Se. This period’s S will triggerj ’s switch to wS, if not there already.

25 / 42



Es: updating given original randomization =) S optimal.Ee;Ee; : : : ;Ee: perpetual e reassures i that j is still in wE .Ee;Ee; : : : ;Ee;Es. Most likely events: either j is still in wEand s is a mistake, or j received an erroneous signal in theprevious period. Odds slightly favor j receiving theerronous signal, and because low, S is optimal.Ee;Ee; : : : ;Ee;Es;Se; : : : ;Se. This period’s S will triggerj ’s switch to wS, if not there already.

To obtain efficiency, lower effective discount factor by dividinggames into N interleaved games.

26 / 42

Belief-Free Equilibria

Another approach is to specify behavior in such a way that thebeliefs are irrelevant.Suppose n = 2.

DefinitionThe profile ((W1;w0

1 ; f1; 1); (W2;w02 ; f2; 2)) is a belief-free eq if

for all (w1;w2) 2 W1 W1, (Wi ;wi ; fi ; i) is a best reply to(Wj ;wj ; fj ; j), all i 6= j .

This approach is due to Piccione (2002), with a refinement byEly and Valimaki (2002). Belief-free eq are characterized byEly, Horner, and Olszewski (2005).

27 / 42

Illustration of Belief Free EqThe product-choice game

c s

H 2;3 0;2

L 3;0 1;1

Row player is a firm choosing H igh or Low quality.Column player is a short-lived customer choosing thecustomized or standard product.In the game with perfect monitoring, grim trigger (play Hctill 1 plays L, then revert to perpetual Ls) is an eq if 1

2 .

28 / 42

The belief-free eq that achieves a payoff of 2 for the row player:Row player always plays 1

2 H + 12 L. (Trivial automaton)

Column player’s strategy has one period memory. Play cfor sure after H in the previous period, and play

L :=

1

12

c +

12 s

after L in the previous period. Player 2’s automaton:

wcw0 w

L

L

H

H

L

29 / 42

Belief-Free Eq in the Prisoners’ DilemmaEly and Valimaki (2002)

E S

E 2;2 1;3

S 3;1 0;0

Perfect monitoring.Player i ’s automaton, (Wi ;wi ; fi ; i):

W = fwEi ;w

Si g;

fi(wai ) =

(1; a = E ;

; a = S;

i(wi ;aiaj) = waji ;

where := 1 1=(3).Both (W1;wE

1 ; f1; 1) and (W1;wS1 ; f1; 1) are best replies to

both (W2;wE2 ; f2; 2) and (W2;wS

2 ; f2; 2).30 / 42

Belief-Free in the Prisoners’ Dilemma-ProofLet V1(aa0) denote player 1’s payoff when 1 is in state wa

1and 2 is in state wa0

2 . Then

V1(EE) = (1 )2 + V1(EE);

V1(ES) = (1 )(3 1)+ [V1(EE) + (1 )V1(SE)];

V1(SE ;a1 = E) = (1 )2 + V1(EE)

= V1(SE ;a1 = S) = (1 )3 + V1(ES);

V1(SS : a1 = E) = (1 )(1)+ [V1(EE) + (1 )V1(SE)]

= V1(SS : a1 = S) = [V1(ES) + (1 )V1(SS)]:

Then, V1(EE) V1(ES) = V1(SE) V1(SS) = (1 )=.Which is true when = 1 1=(3).

31 / 42

Belief-Free in the Prisoners’ DilemmaPrivate Monitoring

Suppose we have conditionally independent privatemonitoring.For " small, there is a value of satisfying the analogue ofthe indifference conditions for perfect monitoring (thesystem of equations is well-behaved, and so you can applythe implicit function theorem).These kinds of strategies can be used to constructequilibria with payoffs in the square (0;2) (0;2) forsufficiently patient players.

32 / 42

Histories are not being used to coordinate play! There isno common understanding of continuation play.This is to be contrasted with strict PPE.Rather, lump sum taxes are being imposed after “deviant”behavior is “suggested.”This is essentially what do in the repeated prisoners’dilemma.Folk theorems for games with private monitoring have beenproved using belief free constructions.These equilibria seem crazy, yet Kandori and Obayashi(2014) report suggestive evidence that in some communityunions in Japan, the behavior accords with such anequilibrium.

33 / 42

Imperfect Monitoring

This works for public and private monitoring.No hope for behavioral robustness.

“Theorem” (Horner and Olszewski, 2006)The folk theorem holds for games with private almost-perfectmonitoring.

Result uses belief-free ideas in a central way, but theequilibria constructed are not belief free.

34 / 42

PurifiabilityBelief-free equilibria typically have the property that playersrandomize the same way after different histories (and sowith different beliefs over the private states of the otherplayer(s)).Harsanyi (1973) purification is perhaps the best rationalefor randomizing behavior in finite normal form games.Can we purify belief-free equilibria (Bhaskar, Mailath, andMorris, 2008)?

The one period memory belief free equilibria of Ely andValimaki (2002), as exemplified above, is not purifiableusing one period memory strategies.They are purifiable using unbounded memory strategies.Open question: can they be purified using boundedmemory strategies? (It turns out that for sequential games,only Markov equilibria can be purified using boundedmemory strategies, Bhaskar, Mailath, and Morris 2013).

35 / 42

What about noisy monitoring?

Current best result is Sugaya (2013):

“Theorem”The folk theorem generically holds for the repeated two-playerprisoners’ dilemma with private monitoring if the support ofeach player’s signal distribution is sufficiently large. Neithercheap talk communication nor public randomization isnecessary, and the monitoring can be very noisy.

36 / 42

Ex Post Equilibria

The belief-free idea is very powerful.Suppose there is an unknown state determining payoffsand monitoring.

!E E S

E 1;1 1;2

S 2;1 0;0

!S E S

E 0;0 2;1

S 1;2 1;1

Let (;!) denote the complete-information repeatedgame when state ! is common knowledge. The monitoringmay be perfect or imperfect public.

37 / 42

Perfect Public Ex Post Equilibria(;!) is complete-information repeated game at !.

DefinitionThe profile of public strategies is a perfect public ex post eqif jht is a Nash eq of (;!) for all histories ht 2 H, wherejht is the continuation public profile induced by ht .

These equilibria can be strict; histories do coordinate play.But the eq are belief free.

“Theorem” (Fudenberg and Yamamoto 2010)Suppose the signals are statistically informative (about actionsand states). The folk theorem holds state-by-state.

These ideas also can be used in some classes of reputationgames (Horner and Lovo, 2009) and in games with privatemonitoring (Yamamoto, 2014).

38 / 42

Perfect Public Ex Post Equilibria(;!) is complete-information repeated game at !.

DefinitionThe profile of public strategies is a perfect public ex post eqif jht is a Nash eq of (;!) for all histories ht 2 H, wherejht is the continuation public profile induced by ht .

These equilibria can be strict; histories do coordinate play.But the eq are belief free.

“Theorem” (Fudenberg and Yamamoto 2010)Suppose the signals are statistically informative (about actionsand states). The folk theorem holds state-by-state.

These ideas also can be used in some classes of reputationgames (Horner and Lovo, 2009) and in games with privatemonitoring (Yamamoto, 2014).

39 / 42

Conclusion

The current theory of repeated games shows that the longrelationships can discourage opportunistic behavior, itdoes not show that long run relationships will discourageopportunistic behavior.Incentives can be provided when histories coordinatecontinuation play.Punishments must be credible, and this can limit theirscope.Some form of monitoring is needed to punish deviators.This monitoring can occur through communicationnetworks.Intertemporal incentives can also be provided in situationswhen there is no common understanding of histories, andso of continuation play.

40 / 42

What is left to understand

Which behaviors in long-run relationships are plausible?Why are formal institutions important?Why do we need formal institutions to protect propertyrights, for example?

41 / 42

Repeated Games and Reputations:Reputations I

George J. Mailath






1 / 59

Introduction

Repeated games have many equilibria. At the same time,certain plausible outcomes are not consistent withequilibrium. Illustrate with product-choice game.Reputation effects: the impact on the set of equilibria(typically of a repeated game) of perturbing the game byintroducing incomplete information of a particular kind.Reputation effects bound eq payoffs in a natural way. Firstillustrate again using the product choice game, and thengive a complete proof in the canonical model of Fudenbergand Levine (1989, 1992), using the tool of relative entropyintroduced by Gossner (2011), andoutline the temporary reputation results of Cripps, Mailath,and Samuelson (2004, 2007).

2 / 59

IntroductionThe product choice game

c s

H 2;3 0;2

L 3;0 1;1

Row player is a firm, choosing between high (H) and low(L) effort.Column player is a customer, choosing between acustomized (c) and standard (s) product.Game has a unique Nash equilibrium: Ls.

3 / 59

Perfect Monitoring

Suppose firm is long-lived, playing the product-choice gamewith a sequence of short-lived customers.Suppose moreover that

monitoring is perfect (everyone sees all past actions) andthe firm has unbounded lifespan, with a discount factor .

Thenfor 1

2 , there is a subgame perfect eq in which Hc isplayed in every period (any deviation results in Ls forever).for 2

3 , every payoff in [1;2] is the payoff of some purestrategy subgame perfect eq.

But for all , the profile in which history is ignored and Ls isplayed in every period is an eq.

4 / 59

Perfect Monitoring

Suppose firm is long-lived, playing the product-choice gamewith a sequence of short-lived customers.Suppose moreover that

monitoring is perfect (everyone sees all past actions) andthe firm has unbounded lifespan, with a discount factor .

Thenfor 1

2 , there is a subgame perfect eq in which Hc isplayed in every period (any deviation results in Ls forever).for 2

3 , every payoff in [1;2] is the payoff of some purestrategy subgame perfect eq.

But for all , the profile in which history is ignored and Ls isplayed in every period is an eq.

5 / 59

Imperfect MonitoringSuppose now that the actions of the firm are imperfectlyobserved. There is a signal y 2 fy ;

y ; g (good experience, bad

experience) with distribution

(y j a1) =

(p; if a1 = H;

q; if a1 = L,

where 0 < q < p < 1.

If 2p q 1, the only pure strategy PPE is perpetual Ls(and as under perfect monitoring, this is always an eq).The maximum payoff the firm can achieve in any PPE is

2(1 p)(p q)

< 2:

(Achieving this bound requires close to 1.)Payoffs are bounded away from payoff from perpetual Hc:

6 / 59

The issue

Repeated games have too many equilibria and not enough:In the finitely horizon product choice, the unique Nash eq isLs in every period, irrespective of the length of the horizon.In the finitely repeated prisoners’ dilemma, the uniqueNash eq is always defect, irrespective of the number ofrepetitions.In the chain store paradox, the chain store cannot deterentry no matter how many entrants it is facing.

It seems counter-intuitive that observing a sufficiently longhistory of H ’s (or sufficiently high fraction of y ’s) in our examplewould not convince customers that the firm will play H.

7 / 59

Incomplete Information

Suppose the customers are not completely certain of all thecharacteristics of the firm. That is, the game has incompleteinformation, with the firm’s characteristics (type) being privateinformation to the firm.Suppose that the customers assign some (small) chance to thefirm being a behavioral (commitment) type (H) who alwaysplays H.Then, if the normal type firm is sufficiently patient, its payoff isclose to 2.

8 / 59

A simple reputation resultA preliminary lemma

LemmaSuppose prob assigned to (H), ((H)) =: 0, is strictlypositive. Fix a Nash equilibrium. Let ht be a positive probabilityperiod-t history in which H is always played. The number ofperiods in ht in which a customer plays s is no larger than

k := log0log 2

:

Define q := 2’s prob firm plays H in period conditional on h .In eq, if customer does play s, then

q 12 :

So, would like an upper bound on

k(t) := #f : q 12g:

9 / 59

A simple reputation resultA preliminary lemma

LemmaSuppose prob assigned to (H), ((H)) =: 0, is strictlypositive. Fix a Nash equilibrium. Let ht be a positive probabilityperiod-t history in which H is always played. The number ofperiods in ht in which a customer plays s is no larger than

k := log0log 2

:

Define q := 2’s prob firm plays H in period conditional on h .In eq, if customer does play s, then

q 12 :

So, would like an upper bound on

k(t) := #f : q 12g:

10 / 59

Let := Prf(H)jhg be the posterior assigned to (H) afterh , and since h is an initial segment of ht ,

+1 = Prf(H)jh ;Hg =Prf(H);Hjhg

PrfHjhg

=PrfHj(H);hgPrf(H)jhg

PrfHjhg

=

q=) = q+1:

Then,

0 = q01 = q0q12 = t

t1Y=0

q t

Yf :q 1

2gq

12

k(t):

Taking logs, log0 k(t) log 12 , and so

k(t) log0log 2

:

11 / 59

Let := Prf(H)jhg be the posterior assigned to (H) afterh , and since h is an initial segment of ht ,

+1 = Prf(H)jh ;Hg =Prf(H);Hjhg

PrfHjhg

=PrfHj(H);hgPrf(H)jhg

PrfHjhg

=

q=) = q+1:

Then,

0 = q01 = q0q12 = t

t1Y=0

q t

Yf :q 1

2gq

12

k(t):

Taking logs, log0 k(t) log 12 , and so

k(t) log0log 2

:

12 / 59

The Theorem

Theorem (Fudenberg and Levine 1989)Suppose (H) receives positive prior probability 0 > 0. In anyNash equilibrium, the normal type’s expected payoff is at least2k . Thus, for all " > 0, there exists such that for all 2 (;1), the normal type’s payoff in any Nash equilibrium is atleast 2 ".

Normal type can always playing H.Applying Lemma yields the lower bound of

k1X=0

(1 )0 +1X

=k(1 )2 = 2k :

This can be made arbitrarily close to 2 by choosing close to 1.

13 / 59

Comments

This result made few assumptions on the nature of theincomplete information. In particular, the type space canbe infinite (even uncountable), as long as there is a grain oftruth on the commitment type (((H)) > 0).The result also holds for finite horizons. If firm payoffs arethe average of the flow (static) payoffs, then averagepayoffs are close to 2 for sufficiently long horizons.Perfect monitoring of the behavioral type’s action is critical.Above argument cannot be extended to either imperfectmonitoring or mixed behavior types (and yet the intuition iscompelling).A new argument is needed.

14 / 59

The Canonical Reputation ModelThe Complete Information Model

A long-lived player 1 faces a sequence of short-lived players, inthe role of player 2 of the stage game.

Ai , finite action set for each player.Y , finite set of public signals of player 1’s actions, a1.(y j a1), prob of signal y 2 Y , given a1 2 A1.Player 2’s ex post stage game payoff is u2(a1;a2; y), and2’s ex ante payoff is

u2(a1;a2) :=X

y2Yu2(a1;a2; y)(y ja1):

Each player 2 max’s her (expected) stage game payoff u2.

15 / 59

Player 1’s ex post stage game payoff is u1(a1;a2; y), and1’s ex ante payoff is

u1(a1;a2) :=X

y2Yu1(a1;a2; y)(y ja1):

Player 1 max’s the expected value of

(1 )X

t0tu1(a1;a2):

Player 1 observes all past actions and signals, while eachplayer 2 only the history of past signals.A strategy for player 1:

1 : [1t=0(A1 A2 Y )t ! (A1):

A strategy for player 2:

2 : [1t=0Y t ! (A2):

16 / 59

The Canonical Reputation Model

A long-lived player 1 faces a sequence of short-lived players, inthe role of player 2 of the stage game.

Ai , finite action set for each player.Y , finite set of public signals of player 1’s actions, a1.(y j a1), prob of signal y 2 Y , given a1 2 A1.Player 2’s ex post stage game payoff is u2(a1;a2; y), and2’s ex ante payoff is

u2(a1;a2) :=X

y2Yu2(a1;a2; y)(y ja1):

Each player 2 max’s her (expected) stage game payoff u2.

17 / 59

The player 2’s are uncertain about the characteristics ofplayer 1: Player 1’s characteristics are described by histype, 2 .All the player 2’s have a common prior on .Type space is partitioned into two sets, = 1 [2, where

1 is the set of payoff types and2 is the set of behavioral (or commitment or action) types.

For 2 1, player 1’s ex post stage game payoff isu1(a1;a2; y ; ), and 1’s ex ante payoff is

u1(a1;a2; ) :=X

y2Yu1(a1;a2; y ; )(y ja1):

Each type 2 1 of player 1 max’s the expected value of

(1 )X

t0tu1(a1;a2; ):

18 / 59

Player 1 knows his type and observes all past actions andsignals, while each player 2 only the history of past signals.A strategy for player 1:

1 : [1t=0(A1 A2 Y )t ! (A1):

If 2 2 is a simple action type, then 9!1 2 (A1) suchthat 1(ht

1; ) = 1 for all ht1.

A strategy for player 2:

2 : [1t=0Y t ! (A2):

19 / 59

Space of outcomes: := (A1 A2 Y )1.A profile (1; 2) with prior induces the unconditionaldistribution P 2 ().For a fixed simple type = (1), the probability measureon conditioning on (and so induced by 1 in everyperiod and 2), is denoted bP 2 ().

Denoting by eP the measure induced by (1; 2) andconditioning on 6= , we have

P = ()bP + (1 ())eP:Given a strategy profile , U1(; ) denotes the type-long-lived player’s payoff in the repeated game,

U1(; ) := EP

"(1 )

1Xt=0

tu1(at ; y t ; )

#:

20 / 59

Denote by (; ) the game of incomplete information.

Definition

A strategy profile (01; 02) is a Nash equilibrium of the game

(; ) if, for all 2 1, 01 maximizes U1(1; 02; ) over player

1’s repeated game strategies, and if for all t and all ht2 2 H2 that

have positive probability under (01; 02) and (i.e., P(ht

2) > 0),

EPu2(

01(h

t1; );

02(h

t2)) j ht

2= max

a22A2EPu2(

01(h

t1; );a2) j ht

2:

Our goal: Reputation Bound (Fudenberg & Levine ’89 ’92)Fix a payoff type, 2 1. What is a “good” lower bound, uniformacross Nash equilibria 0 and , for U1(

0; )?

Our tool (Gossner 2011): relative entropy.

21 / 59

Denote by (; ) the game of incomplete information.

Definition

A strategy profile (01; 02) is a Nash equilibrium of the game

(; ) if, for all 2 1, 01 maximizes U1(1; 02; ) over player

1’s repeated game strategies, and if for all t and all ht2 2 H2 that

have positive probability under (01; 02) and (i.e., P(ht

2) > 0),

EPu2(

01(h

t1; );

02(h

t2)) j ht

2= max

a22A2EPu2(

01(h

t1; );a2) j ht

2:

Our goal: Reputation Bound (Fudenberg & Levine ’89 ’92)Fix a payoff type, 2 1. What is a “good” lower bound, uniformacross Nash equilibria 0 and , for U1(

0; )?

Our tool (Gossner 2011): relative entropy.

22 / 59

Relative Entropy

X a finite set of outcomes.The relative entropy or Kullback-Leibler distance betweenprobability distributions p and q over X is

d(pkq) :=Xx2X

p(x) logp(x)q(x)

:

By convention, 0 log 0q = 0 for all q 2 [0;1] and p log p

0 =1

for all p 2 (0;1]. In our applications of relative entropy, thesupport of q will always contain the support of p.Since relative entropy is not symmetric, often say d(pkq) isthe relative entropy of q with respect to p.d(pkq) 0, and d(pkq) = 0 () p = q.

23 / 59

Relative entropy is expected prediction error

d(pkq) measures observer’s expected prediction error onx 2 X using q when true dsn is p:

n i.i.d. draws from X under p has probabilityQ

x p(x)nx ,where nx is the number of realization of x in sample.Observer assigns same sample probability

Qx q(x)nx .

Log likelihood ratio is

L(x1; : : : ; xn) =P

x nx log p(x)q(x) ;

and so1nL(x1; : : : ; xn)! d(pkq):

24 / 59

The chain rule

LemmaSuppose P;Q 2 (X Y ), X and Y finite sets. Then

d(PkQ) = d(PXkQX ) +P

x PX (x)d(PY (jx)kQY (jx))= d(PXkQX ) + EPX d(PY (jx)kQY (jx)):

Proof.

d(PkQ) =P

x ;y P(x ; y) log PX (x)QX (x)

P(x ;y)PX (x)

QX (x)Q(x ;y)

25 / 59

The chain rule




Proof.

d(PkQ) =P


P(x ;y)PX (x)

QX (x)Q(x ;y)

26 / 59

The chain rule




Proof.

d(PkQ) =P


P(x ;y)PX (x)

QX (x)Q(x ;y)

= d(PXkQX ) +P

x ;y P(x ; y) log PY (y jx)QY (y jx)

27 / 59

The chain rule




Proof.

d(PkQ) =P


P(x ;y)PX (x)

QX (x)Q(x ;y)

= d(PXkQX ) +P


28 / 59

The chain rule




Proof.

d(PkQ) =P


P(x ;y)PX (x)

QX (x)Q(x ;y)

= d(PXkQX ) +P


29 / 59

The chain rule




Proof.

d(PkQ) =P


P(x ;y)PX (x)

QX (x)Q(x ;y)

= d(PXkQX ) +P


= d(PXkQX ) +P

x PX (x)P

y PY (y jx) log PY (y jx)QY (y jx)

:

30 / 59

A grain of truth

LemmaLet X be a finite set of outcomes. Suppose p;p0 2 (X ) andq = "p + (1 ")p0 for some " > 0. Then,

d(pkq) log ":

Proof.Since q(x)=p(x) ", we have

d(pkq) =P

x p(x) log q(x)p(x)

Px p(x) log " = log ":

31 / 59

A grain of truth

LemmaLet X be a finite set of outcomes. Suppose p;p0 2 (X ) andq = "p + (1 ")p0 for some " > 0. Then,

d(pkq) log ":

Proof.Since q(x)=p(x) ", we have

d(pkq) =P

x p(x) log q(x)p(x)

Px p(x) log " = log ":

32 / 59

Back to reputations!Fix 1 2 (A1) and suppose ((1)) > 0.In a Nash eq, at history ht

2, 2(ht2) is a best response to

1(ht2) := EP[1(ht

1; ) j ht2] 2 (A1);

that is, 2(ht2) maximizesP

a1

Py u2(a1;a2; y)(y ja1)1(a1jht

2):

At ht2, 2’s predicted dsn on the signal y t is

p(ht2) := (j1(ht

2)) =P

a1(ja1)1(a1jht

2):

If player 1 plays 1, true dsn on y t is

p := (j1) =P

a1(ja1)1(a1):

Player 2’s one-step ahead prediction error is

dpkp(ht

2):

33 / 59



1(ht2) := EP[1(ht

1; ) j ht2] 2 (A1);


a1


2):


p(ht2) := (j1(ht

2)) =P

a1(ja1)1(a1jht

2):


p := (j1) =P

a1(ja1)1(a1):


dpkp(ht

2):

34 / 59



1(ht2) := EP[1(ht

1; ) j ht2] 2 (A1);


a1


2):


p(ht2) := (j1(ht

2)) =P

a1(ja1)1(a1jht

2):


p := (j1) =P

a1(ja1)1(a1):


dpkp(ht

2):

35 / 59



1(ht2) := EP[1(ht

1; ) j ht2] 2 (A1);


a1


2):


p(ht2) := (j1(ht

2)) =P

a1(ja1)1(a1jht

2):


p := (j1) =P

a1(ja1)1(a1):


dpkp(ht

2):

36 / 59

Bounding prediction errors

Player 2 is best responding to an action profile 1(ht2) that

is dpkp(ht

2)-close to 1 (as measured by the relative

entropy of the induced signals).To bound player 1’s payoff, it suffices to bound the numberof periods in which d

pkp(ht

2)

is large.

For any t , Pt2 is the marginal of P on Y t . Then,

Pt2 = ()bPt

2 + (1 ())ePt2;

and sod(bPt

2kPt2) log():

37 / 59

Bounding prediction errors

Player 2 is best responding to an action profile 1(ht2) that

is dpkp(ht

2)-close to 1 (as measured by the relative

entropy of the induced signals).To bound player 1’s payoff, it suffices to bound the numberof periods in which d

pkp(ht

2)

is large.For any t , Pt

2 is the marginal of P on Y t . Then,

Pt2 = ()bPt

2 + (1 ())ePt2;

and sod(bPt

2kPt2) log():

38 / 59

Applying the chain rule:

log() d(bPt2kP

t2)

= d(bPt12 kPt1

2 ) + EbPd(pkp(ht1

2 ))

=t1X=0

EbPd(pkp(h

2)):

Since this holds for all t ,

1X=0

EbPd(pkp(h

2)) log():

39 / 59

Applying the chain rule:

log() d(bPt2kP

t2)

= d(bPt12 kPt1

2 ) + EbPd(pkp(ht1

2 ))

=t1X=0

EbPd(pkp(h

2)):

Since this holds for all t ,

1X=0

EbPd(pkp(h

2)) log():

40 / 59

From prediction bounds to payoff bounds

Definition

An action 2 2 (A2) is an "-entropy confirming best responseto 1 2 (A1) if there exists 01 2 (A1) such that

1 2 is a best response to 01; and2 d((j1)k(j

01)) ".

The set of "-entropy confirming BR’s to 1 is denoted Bd" (1).

41 / 59


Definition



01)) ".


In a Nash eq, at any on-the-eq-path history ht2, player 2’s action

is a d(pkp(ht2))-entropy confirming BR to 1.

42 / 59


Definition



01)) ".


Define, for all payoff types 2 1,

v 1(") := min

22Bd"(1)

u1(1; 2; );

and denote by w1 the largest convex function below v

1 .

43 / 59

The product-choice game I

c s

H 2;3 0;2

L 3;0 1;1

Suppose 1 = 1 H.c is unique BR to 1 if 1(H) > 1

2 .

s is also a BR to 1 if 1(H) = 12 .

d(1 Hk12 H + 1

2 L) = log 11=2 = log 2 0:69.

v H(") =

(2; if " < log 2;0; if " log 2:

44 / 59

A picture is worth a thousand words

"0

player 1

payoffs

2

log 2

wH

v H

Difference between this and the earlier bound in o(1 ).

45 / 59

A picture is worth a thousand words

"0

player 1

payoffs

2

log 2

wH

v H

Difference between this and the earlier bound in o(1 ).

46 / 59

The product-choice game II

c s

H 2;3 0;2

L 3;0 1;1

Suppose 1 = 23 H + 1

3 L.

c is unique BR to 1 if 1(H) > 12 .

s is also a BR to 1 if 1(H) = 12 .

d(1k12 H + 1

2 L) = 23 log 2=3

1=2 + 13 log 1=3

1=2

= 53 log 2 log 3

=: " 0:06:

47 / 59

Two thousand?

"0

player 1

payoffs

221

3

13

log 2"

w1

wH

v 1

v H

48 / 59

The reputation bound

Proposition

Suppose the action type = (1) has positive prior probability,() > 0, for some potentially mixed action 1 2 (A1). Then,player 1 type ’s payoff in any Nash equilibrium of the game(; ) is greater than or equal to w

1("), where

" := (1 ) log():

The only aspect of the set of types and the prior that plays arole in the proposition is the probability assigned to .The set of types may be very large, and other quite crazy typesmay receive significant probability under the prior .

49 / 59

The proof

Since in any Nash equilibrium (01; 02), each payoff type has

the option of playing 1 in every period, we have

U1(0; ) = (1 )

X1

t=0tEP[u1(

01(h

t1);

02(h

t2); ) j ]

(1 )X1

t=0tE

bPu1(1; 02(h

t2); )

50 / 59

The proof



U1(0; ) = (1 )

X1

t=0tEP[u1(

01(h

t1);

02(h

t2); ) j ]

(1 )X1

t=0tE

bPu1(1; 02(h

t2); )

(1 )X1

t=0tE

bPv 1(d(pkp(ht

2)))

51 / 59

The proof



U1(0; ) = (1 )

X1

t=0tEP[u1(

01(h

t1);

02(h

t2); ) j ]

(1 )X1

t=0tE

bPu1(1; 02(h

t2); )

(1 )X1

t=0tE

bPv 1(d(pkp(ht

2)))

(1 )X1

t=0tE

bPw1(d(pkp(ht

2)))

52 / 59

The proof



U1(0; ) = (1 )

X1

t=0tEP[u1(

01(h

t1);

02(h

t2); ) j ]

(1 )X1

t=0tE

bPu1(1; 02(h

t2); )

(1 )X1

t=0tE

bPv 1(d(pkp(ht

2)))

(1 )X1

t=0tE

bPw1(d(pkp(ht

2)))

w1

(1 )

X1

t=0tE

bPd(pkp(ht2))

53 / 59

The proof



U1(0; ) = (1 )

X1

t=0tEP[u1(

01(h

t1);

02(h

t2); ) j ]

(1 )X1

t=0tE

bPu1(1; 02(h

t2); )

(1 )X1

t=0tE

bPv 1(d(pkp(ht

2)))

(1 )X1

t=0tE

bPw1(d(pkp(ht

2)))

w1

(1 )

X1

t=0tE

bPd(pkp(ht2))

w1

(1 ) log()

= w

1("):

54 / 59

Patient player 1

Corollary

Suppose the action type = (1) has positive prior probability,() > 0, for some potentially mixed action 1 2 (A1). Then,for all 2 1 and > 0, there exists a < 1 such that, for all 2 (;1), player 1 type ’s payoff in any Nash equilibrium of thegame (; ) is greater than or equal to

v 1(0) :

55 / 59

When does Bd0 (1) = BR(1)?

Suppose (ja1) 6= (ja01) for all a1 6= a01. Then pure actionStackelberg payoff is a reputation lower bound providedthe simple Stackelberg type has positive prob.Suppose (j1) 6= (j01) for all 1 6= 01. Then mixedaction Stackelberg payoff is a reputation lower boundprovided the prior includes in its support a dense subset of(A1).

56 / 59

How general is the result?

The same argument (with slightly worse notation) works ifthe monitoring distribution depends on both players actions(though statistical identifiability is a more demandingrequirement, particularly for extensive form stage games).The same argument (with slightly worse notation) alsoworks if the game has private monitoring. Indeed, noticethat player 1 observing the signal played no role in theproof.

57 / 59

The Purchase Game

2

don’t buy

00

1

buy

H

11

L

21

BR(H) = fbg.But (jHd) = (jLd) and soBd

0 (Hd) = fd ;bg, implyingv (H)

H (0) = 0, and no usefulreputation bound.

58 / 59

The Purchase Game

2

don’t buy

00

1

buy

H

11

L

21

BR(H) = fbg.But (jHd) = (jLd) and soBd

0 (Hd) = fd ;bg, implyingv (H)

H (0) = 0, and no usefulreputation bound.

59 / 59

Repeated Games and Reputations:Reputations II

George J. Mailath






1 / 26

Impermanent Reputationsunder Imperfect Monitoring

Imperfect monitoring of long-lived players is not animpediment for reputation effects.But it does have implications for its permanance:Reputation effects are necessarily temporary in thepresence of imperfect monitoring.(Under perfect monitoring, permanent reputation effectsare trivially possible.)

2 / 26

Imperfect Monitoring

Suppose only two types, the normal type 0 and the simpleaction type := (1).Allow signal dsn to depend on a1 and a2.Maintain assumption that player 1 observes past a2.

Assumption: Full support(y j1;a2) > 0 for all y 2 Y and a2 2 A2.

Assumption: Identifiability[(j; 2)]y ;a1 has full column rank.

Identifiability implies Bd0 (1) = BR(1).

3 / 26

Disappearing Reputations

Given a strategy profile (1; 2) of the incomplete informationgame, the short-lived player’s belief in period t that player 1 istype is

t(ht2) := P(jht

2);

and so 0 is the period 0, or prior, probability assigned to .

Proposition (Cripps, Mailath, Samuelson 2004)Suppose player 2 has a unique best response a2 to 1 and(1; a2) is not a Nash equilibrium of the stage game. If (1; 2)is a Nash equilibrium of the game (; ), then

t ! 0; eP-a.s.

4 / 26

IntuitionBayes’ rule determines t after all histories (of 1’s actions).At any Nash eq, t is a bounded martingale and so

91 : t ! 1 P-a.s. (and hence eP- and bP-a.s.).

1 Suppose result is false. Then, there is a positiveeP-probability event on which 1 is strictly positive.2 On this event, player 2 believes that both types of player 1

are eventually choosing the same distribution over actions1 (because otherwise player 2 could distinguish them).

3 Consequently, on a positive eP-probability set of histories,eventually, player 2 will always play a best response to 1.

4 Since player 1 is more informed than player 2, player 1knows this.

5 This yields the contradiction, since player 1 has a strictincentive to play differently from 1.

5 / 26


91 : t ! 1 P-a.s. (and hence eP- and bP-a.s.).1 Suppose result is false. Then, there is a positiveeP-probability event on which 1 is strictly positive.

2 On this event, player 2 believes that both types of player 1are eventually choosing the same distribution over actions1 (because otherwise player 2 could distinguish them).




6 / 26


91 : t ! 1 P-a.s. (and hence eP- and bP-a.s.).1 Suppose result is false. Then, there is a positiveeP-probability event on which 1 is strictly positive.2 On this event, player 2 believes that both types of player 1





7 / 26







8 / 26







9 / 26







10 / 26

Player 2 either learns the type is normal or doesn’tbelieve it matter-I

LemmaAt any Nash eq,

limt!1

t(1 t) 1 eE [1(ht

1; 0)j(ht2] = 0; P-a.s.

11 / 26

Player 2 either learns the type is normal or doesn’tbelieve it matter-II

For " > 0 small, on the event

X t := p(ht

2) p(ht2) < "1

;

player 2 best responds to 1, i.e., 2(ht2) = a2.

Player 2 cannot have too many eP-expected surprises (i.e.,periods in which player 2 both assigns a nontrivial probability toplayer 1 being and believes p(ht

2) is far from p(ht2)):

Lemma1X

t=0

EeP

h(t)2(1 1X t )

i

2 log(1 0)

"21

;

where 1X t is the indicator function for the event X t .

12 / 26

Implications of Permanent ReputationsIf reputations do not disappear almost surely under eP, then

eP(1 = 0) < 1;

and so there exists a > 0 and T0 such that

0 < eP(t ; 8t T0) =: eP(F ):

On F , eventually player 2 believes 0 plays 1:

Lemma

Suppose t 6! 0 eP-a.s. There exists T1 such that for

B :=\

tT1X t ;

we have eP(B) eP(F \ B) > 0:

13 / 26

Implications of Permanent ReputationsIf reputations do not disappear almost surely under eP, then

eP(1 = 0) < 1;

and so there exists a > 0 and T0 such that

0 < eP(t ; 8t T0) =: eP(F ):

On F , eventually player 2 believes 0 plays 1:

Lemma

Suppose t 6! 0 eP-a.s. There exists T1 such that for

B :=\

tT1X t ;

we have eP(B) eP(F \ B) > 0:

14 / 26

Conclusion of Argument

On B, not only is player 2 always playing a2, the BR to 1,but player 1 eventually is confident that 2 is doing so.

Moreover, again on B, for all , for sufficiently large t , 1 isconfident that 2 is doing so in periods, t ; t + 1; : : : ; t + ,irrespective of the signals 2 observes in periodst ; t + 1; : : : ; t + .Imperfect monitoring is key here: The minimum prob of any sequence of signals under 1 is bounded away from zero.Contradiction: Player 1 best responding to player 2 cannotplay 1.

15 / 26


On B, not only is player 2 always playing a2, the BR to 1,but player 1 eventually is confident that 2 is doing so.Moreover, again on B, for all , for sufficiently large t , 1 isconfident that 2 is doing so in periods, t ; t + 1; : : : ; t + ,irrespective of the signals 2 observes in periodst ; t + 1; : : : ; t + .

Imperfect monitoring is key here: The minimum prob of any sequence of signals under 1 is bounded away from zero.Contradiction: Player 1 best responding to player 2 cannotplay 1.

16 / 26


On B, not only is player 2 always playing a2, the BR to 1,but player 1 eventually is confident that 2 is doing so.Moreover, again on B, for all , for sufficiently large t , 1 isconfident that 2 is doing so in periods, t ; t + 1; : : : ; t + ,irrespective of the signals 2 observes in periodst ; t + 1; : : : ; t + .Imperfect monitoring is key here: The minimum prob of any sequence of signals under 1 is bounded away from zero.

Contradiction: Player 1 best responding to player 2 cannotplay 1.

17 / 26


On B, not only is player 2 always playing a2, the BR to 1,but player 1 eventually is confident that 2 is doing so.Moreover, again on B, for all , for sufficiently large t , 1 isconfident that 2 is doing so in periods, t ; t + 1; : : : ; t + ,irrespective of the signals 2 observes in periodst ; t + 1; : : : ; t + .Imperfect monitoring is key here: The minimum prob of any sequence of signals under 1 is bounded away from zero.Contradiction: Player 1 best responding to player 2 cannotplay 1.

18 / 26

Comments

Result is very general. Holds if:there are many types,under private monitoring of both players’ actions, as long asan identifiability condition holds on both players’ actions(Cripps, Mailath, and Samuelson 2007, Mailath andSamuelson 2014).

19 / 26

Asymptotic Restrictions on Behavior I

Result is on beliefs. What about behavior? If player 2’s actionsare observed by player 1, then:

For any Nash eq of the incomplete information game and for alleP-almost all sequences of histories fhtg, every cluster point ofthe sequence of continuation profiles is a Nash eq of thecomplete information game with normal type player 1.

If player 2 is imperfectly monitored, then need to replace thesecond Nash with correlated.

20 / 26

Asymptotic Restrictions on Behavior II

Suppose player 2’s actions are perfectly monitored.Suppose the stage game has a strict Nash equilibrium a.Suppose for all " > 0, there exists > 0 and an eq of thecomplete information game (0) such that for all 0 2 (0; )the incomplete information game with prior 0 has an eqwith player 1 payoff within " of u1((0)).

Given any prior 0 and any , for all " > 0, there exists a Nasheq of the incomplete information game in which theeP-probability of the event that eventually a is played in everyperiod is at least 1 ".

21 / 26

Interpretation

1

0

0 00

reputation effects

22 / 26

Reputation Effects with Long-lived Player 2?

Simple types no longer provide the best bounds on payoffs.For the repeated PD, a reputation for tit-fot-tat is valuable(while a reputation for always cooperate is not!), Kreps,Milgrom, Roberts, and Wilson (1982).

The bound of surprises arguments still hold with long-livedplayer 2 (as does the disappearing reputation result), butplayer need not best respond to the belief that on theequilibrium path, player 1 plays like an action type.There are some positive results, but few and make strongassumptions.

23 / 26

Reputation Effects with Long-lived Player 2?

Simple types no longer provide the best bounds on payoffs.For the repeated PD, a reputation for tit-fot-tat is valuable(while a reputation for always cooperate is not!), Kreps,Milgrom, Roberts, and Wilson (1982).The bound of surprises arguments still hold with long-livedplayer 2 (as does the disappearing reputation result), butplayer need not best respond to the belief that on theequilibrium path, player 1 plays like an action type.There are some positive results, but few and make strongassumptions.

24 / 26

Persistent Reputations

How to rescue reputations?

Limited observabilitySuppose short-lived players can only observe the last Lperiods. Then reputations can persist and may cycle (Liu2011, Liu and Skrzypacz 2014).Changing typesYields both cyclical reputations (Phelan 2006) andpermanent reputations (Ekmekci, Gossner, and Wilson2012).

25 / 26

Reputation as Separation

Are reputations always about scenarios where uninformedplayers assign positive probability to “good” types?Sometimes reputations are about behavior where informedplayers are trying to avoid a “bad” reputation.But avoidance of bad reputations is hard: Mailath andSamuelson (2001), Morris (2001), and Ely and Valimaki(2003).

26 / 26

Further Reading

-

Repeated Gamesand Reputations

George J. MailathLarry Samuelson

27 / 26

Repeated Games - Institute for Fiscal Studies Bae University College London [email protected]...

Documents

Transcript of Repeated Games - Institute for Fiscal Studies Bae University College London [email protected]...