Game Theory - econ.jku.at · Repeated Games Until now, we considered so-called "one-shot games"...
Transcript of Game Theory - econ.jku.at · Repeated Games Until now, we considered so-called "one-shot games"...
Recap: SPNE
The solution concept for dynamic games with completeinformation is the subgame perfect Nash Equilibrium (SPNE)
Selten (1965):
A strategy profile s∗ is a subgame perfect Nash equilibrium ofextensive games if the strategy profile s∗ is a Nash equilibrium inall subgames. Any finite extensive form game with complete (butimperfect) information will have a SPNE (possibly involving mixedstrategies)
Proof: by backward induction
SPNE may not necessarily be unique
SPNE eliminates all non-credible NE
Non-credible NE: there is no credible threat for deviation
2 / 41
Repeated Games
Until now, we considered so-called ”one-shot games” with the(implicit) assumption that the game is played once among playerswho expect to not meet each other again
In real life, games are typically played within a larger context andactions affect not only present situation, but they may also haveimplications for the future
3 / 41
Repeated Games
Players may have considerations about the future, affecting alsotheir behavior in the present, i.e. if the same players meet againrepeatedly, threats and promises about future behavior caninfluence current behavior
Such situations are captured in repeated games, in which a ”stagegame” is played repeatedly
Normal-form or extensive-form games are repeated finitely orinfinitely regardless of what has been played in previous games,and often with the same set of players
The outcome of a repeated game is a sequence of outcomes
4 / 41
Finitely repeated game
Definition
Let T = {0, 1, ...,n} be the set of all possible dates
Let G be a stage game with perfect information, which is playedat each t ∈ T
The payoff of each player in this larger game is the sum of payoffsthe player receives in each stage game
Denote this larger game as GT
At the beginning of each repetition, a player considers what eachplayer has played in the previous rounds
A strategy in the repeated game GT assigns a strategy to eachstage game G
5 / 41
Two-stage game: Prisoners’ dilemma
Consider a situation in which two players play the Prisoners’ Dilemmagame
C DC 5,5 0,6D 6,0 1,1
Now assume, T = {0, 1} and G is this Prisoners’ Dilemma game. Then,the repeated game GT can be represented in the extensive form as:
6 / 41
Two-stage game: Prisoners’ dilemma (cont.)
At t = 1, a history is a strategy profile of the game, indicating what hasbeen played at t = 0: (C ,C ), (C ,D), (D ,C ), (D ,D)
GT has 4 subgames - note that payoffs are the sum of payoffs fromboth games (no discounting)!
We have (D ,D) as a unique Nash equilibrium at each of these subgames→ the actions at t = 1 are independent of what is played at t = 0
Given the behavior in t = 1, the game in t = 0 reduces to:
C DC 6,6 1,7D 7,1 2,2
We add 1 to each payoff, as this is the payoff of t = 1 for each player
The unique equilibrium of this reduced game is (D ,D)
This is also a unique subgame-perfect equilibrium: at each history, eachplayer plays D
7 / 41
n-stage games
What about arbitrary n?
On the last day n, independent of what has been played in theprevious rounds, there is a unique Nash equilibrium for theresulting subgame: Each player plays D .
The actions at day n − 1 do not have any effect in what will beplayed on the next day.
Going back to date 0, we would find a unique SPNE: At each t foreach outcomes of previous stage games, players play D
This is a general result!
8 / 41
Finitely repeated games with unique NE in stage game
Definition
Given a stage game G , let GT denote the finitely repeated game inwhich G is played T times, with the outcomes of all preceding playsobserved before the next round. The payoffs for GT are simply the sumof the payoffs from the T stage games.
Selten’s Theorem
If the stage game G has a unique Nash equilibrium then, for anyfinite T , the repeated game GT has a unique subgame perfect Nashequilibrium: the Nash equilibrium of G is played in every stage
Proof can be found in any advanced game theory or micro book
9 / 41
Finitely repeated games with multiple NE in stage game
Consider the following modified version of the Prisoners’ dilemma:
C D R
C 5,5 0,6 0,0
D 6,0 1,1 0,0
R 0,0 0,0 4,4
There are two pure strategy NE in this stage game
Assume that this stage game is played twice
Playing any sequence of NE would be a SPNENow consider the following conditional strategy (conditional onfuture NE):
In the first stage, players anticipate that the second stage outcomewill be a Nash equilibrium of the stage game, hence (D ,D) or (R,R)Players anticipate that (R,R) will be the second stage outcome ifthe first-stage outcome is (C ,C ), however (D ,D) will be thesecond-stage outcome otherwise
10 / 41
Finitely repeated games with multiple NE in stage game
Players’ first-stage interactions amount to the following one-shotgame:
C D R
C 9,9 1,7 1,1
D 7,1 2,2 1,1
R 1,1 1,1 5,5
There are 3 pure-strategy Nash equilibria - (C ,C ), (D ,D) and(R,R)
1 The NE (D ,D) corresponds to (D ,D) in the first-stage and (D ,D)in the second-stage
2 The NE (R,R) corresponds to (R,R) in the first-stage and (D ,D)in the second-stage
3 The NE (C ,C ) corresponds to (C ,C ) in the first-stage and (R,R)in the second-stage
11 / 41
Finitely repeated games with multiple NE in stage game
(D ,D) and (R,R) are concatenations of Nash equilibria outcomesof the stage game
(C ,C ) is a qualitatively different result - (C ,C ) in the first stagegame is not a NE
Cooperation is possible in the first stage of a SPNE of a repeatedgame because of a credible threat and punishment
However, SPNE depends on assumption about players’anticipation in the second stage (see conditional strategy)
Our conditional strategy requires playing (D ,D) in the secondstage, which appears silly if (R,R) is available
The credible punishment for a player if deviation from (C ,C )in the first-stage is playing a pareto-dominated equilibrium in thesecond-stage
12 / 41
Single-deviation principle
Verifying that a given strategy profile is a SPNE can be difficult - thegame above has 10 subgames, hence 310 strategies of each player!
Solution: Single-deviation principle
Definition
Given the strategies of the other players, strategy si of player i in a repeatedgame satisfies the single-deviation principle if player i cannot gain bydeviating from si in a single stage game, holding all other players’ strategiesand the rest of her own strategy fixed.
Proposition
In a finitely repeated game, a strategy profile s is a SPNE if and only if eachplayer’s strategy satisfies the single-deviation principle.
This proposition also extends to infinitely repeated games, given futurepayoffs are discounted!
13 / 41
Single-deviation principle
Let’s check for our example:
The single-deviation principle requires to check for singledeviations for each player at each stage:
Second-stage: If (C ,C ) is observed, best response is R(5 + 4 > 5 + 0)
First-stage: Deviation in the first-stage would yield a payoff of6 + 1 < 5 + 4
No deviation is profitable - (C ,C ) in the first stage and (R,R) inthe second-stage is a SPNE
14 / 41
Finitely vs. infinitely repeated games
Credible threats and promises about future behavior can influencecurrent behavior
If the relationship is only finitely repeated, this is only true if thestage game has multiple equilibria
If G is a static game of complete information with multiple NEs, thenthe T -times repeated game GT may have SPNEs in which, for anyt < T , the outcome in stage t is not a NE of G
For infinitely repeated games this result is stronger: even if thestage game G has a unique NE, there may be SPNEs of theinfinitely repeated game GT , in which the outcome is not a NE ofthe stage game G .
Hence, infinitely repeated games may be suitable for modelingcooperation sustained by threats and punishment strategies
15 / 41
Infinitely repeated games
Simply summing the payoffs from an infinite sequence of stagegames does not provide a useful meaasure of players’ payoff in aninfinitely repeated game. Why?
Solution: Use the the discounted sum of the sequence of payoffs
Each player i has a payoff function ui and a discount factorδi ∈ [0, 1] such that an infinite sequence (s1, s2, ...) is evaluated by:
ui(s1) + δiui(s
2) + δ2i ui(s3) + ... =
∞∑t=1
δt−1i ui(st)
The discount factor δi measures how much a player cares aboutthe future
when δi is close to 0, player i does not care about the future →impatientwhen δi is close to 1, player i does care about the future → patient
16 / 41
Infinitely repeated games
The infinitely repeated game differs only by the set of terminalhistories, which is the set of infinite sequences (s1, s2, ...)
The payoff is the present value∞∑t=1
δt−1i ui(st)
Note: one could also use the present value as a measure of payoffsin finitely repeated games
One could also reinterpret δ as a repeated game that ends after arandom number of repetitions, where δ is the probability that thegame ends immediately, and 1− δ that the game continues for atleast one more stage
Are infinitely repeated games likely to occur? Intuitively, in a lotof long-lasting interactions, the termination date of interaction istypically unknown to players or plays a little role
17 / 41
SPNE in infinitely repeated games
Consider again the following prisoners’ dilemma:
C D
C 5,5 0,6
D 6,0 1,1
Analogous to finitely repeated games: playing the unique NE(D ,D) in every stage game implies a NE in every subgame of theinfinitely repeated game ⇒ SPNE
In the presence of credible punishment, we may also get SPNEdifferent from Nash equilibria outcomes of the stage game → thereare strategies leading to (C ,C ) in every stage game as a SPNE
Examples of such strategies:(Grim) trigger strategyTit-for-tat strategyLimited punishment
18 / 41
Strategies in an infinitely repeated game
(Grim) trigger strategy
si(�) = C
and si(s1, ..., s t) =
{C if (s1j , ...s
tj ) = (C , ...C )
D otherwise
for every history (s1, ..., s t) and j is the other player
Player i chooses C at the start of the game and after any historyin which every previous action of player j was C
Whenever player j once chooses D , player i will also switch toaction D
Once D is reached, this state is never left
19 / 41
Strategies in an infinitely repeated game (cont.)
Tit-for-tat
The length of the punishment depends on the behavior of thepunished player
If the punished player continues to deviate with playing D ,tit-for-tat continues to do so as well ⇒ no reversion to C
Whenever the punished player reverts to C , then tit-for-tat revertsto C as well
In other words: do whatever the other player did in the previousperiod
20 / 41
Nash equilibrium: (Grim) trigger strategy
Assume that player 1 uses the grim trigger strategy
If player 2 uses this strategy as well → (C,C) will be the outcomein every period with payoffs (5,5,...)
The discounted sum is (5 + 5δ + 5δ2 + 5δ3 + ...) = 51−δ
If player 2 uses a different strategy, then in at least one period, heraction is D
In all subsequent periods, player 1 chooses D as well, since it is abest-response
Up to the first period to which player 2 chooses D her payoff is 5each period
21 / 41
Nash equilibrium: (Grim) trigger strategy
Player 2’s subsequent sequence of payoffs is (6, 1, 1, ...) (gains oneunit from deviation, loses one unit due to reaction of player 1)
Hence, the discounted sum from deviating is:(6 + 1δ + 1δ2 + 1δ3, ...) = 6 + δ
1−δ
Cooperation is successful if the payoff of cooperation is at least asgood as the payoff of cheating:
51−δ ≥ 6 + δ
1−δ
Cooperation if δ ≥ 15
In this example, only very impatient players with δ < 15 can
increase their payoff by deviating
22 / 41
Nash equilibrium: Tit-for-tat
Assume that player 1 uses the tit-for-tat strategy
When player 2 also adheres to this strategy, the equilibriumoutcome will be (C ,C ) in every period
Now assume, D is a best response to tit-for-tat for player 2
Denote t as the first period, where player 2 chooses D → player 1will choose D in period t + 1 onwards, until player 2 reverts to C
Player 2 has two options from period t + 1 onwards: revert to Cand face the same situation as at the start of the game, or continuewith D , in which case player 1 will continue to do so as well
So if player 2’s best response to tit-for-tat is choosing D in someperiod, then she either alternates between C and D or chooses Din every period
23 / 41
Nash equilibrium: Tit-for-tat (cont.)
Payoff for alternating between C and D is (6, 0, 6, 0...) = 61−δ2
Payoff for staying with D is (6, 1, 1, ...) = 6 + δ1−δ
Payoff of playing tit-for-tat is (5, 5, 5, ...) = 51−δ
Hence, tit-for-tat is best response to tit-for-tat if and only if:
51−δ ≥
61−δ2 and 5
1−δ ≥ 6 + δ1−δ
Both of these conditions are equivalent to δ ≥ 15
Whenever δ ≥ 15 , a strategy pair, in which both players use
tit-for-tat, is a Nash equilibrium
24 / 41
Folk Theorem
A main objective of studying repeated games is to explore therelation between the short-term incentives and long term incentives
When players are patient, their long-term incentives take over, anda large set of behavior may result in equilibrium.
The equilibrium multiplicity is a general implication of (infinitely)repeated games
This main result is stated in the so-called Folk Theorem
Before, we need to introduce two further definitions
25 / 41
Feasible payoffs
We call payoffs (x1, ..., xn) feasible in the stage game G if the payoffs area convex combination (i.e. a weighted average) of the pure strategypayoffs of G
Graphically, the set of feasible payoffs for the prisoners’ example:
1,1
0,6
6,0
5,5
Pure strategy payoffs (1, 1), (6, 0), (0, 6) and (5, 5) are feasible
All other pairs in the shaded-region are weighted averages ofpure-strategy payoffs
26 / 41
Average payoffs
Players’ payoffs are still defined over the present value (PV ) of theinfinite payoff stream but expressed in terms of the average payoff fromthe same infinite sequence of payoffs.
The average payoff is the payoff that would have to be received in everystage game so as to yield the PV
Definition
Given the discount factor δ, the average payoff of the infinite sequence ofpayoffs u1, u2, ... is
(1− δ)∞∑t=1
δt−1i ui(s
t)
Note that if there is a fixed payoff stream u in every stage, the PV wouldbe u
1−δ ⇒ the average payoff is (1− δ)PV = (1− δ) u1−δ = u
The average payoff is directly comparable to payoffs from a stage game
Since average payoff is just a rescaling of the PV, maximising the averagepayoff is equivalent to maximising the PV
27 / 41
Folk Theorem
The Folk Theorem states that any strictly rational and feasiblepayoff vector can be supported in a SPNE when the players aresufficiently patient
Folk Theorem
Let G be a finite static game of complete information. Let (e1, ..., en)denote the payoffs from a Nash equilibrium of G , and let (x1, ..., xn)denote any other feasible payoffs from G . If xi > ei for every player iand if δ is sufficiently close to one, then there exists a subgame perfectNash equilibrium of the infinitely repeated game G∞ that achieves(x1, ..., xn) as the average payoff.
28 / 41
Folk Theorem: Proof (I)
The proof follows the arguments for the infinitely repeatedPrisoners’ dilemma
Let (ae1, ..., aen) be the NE of G that yields the equilibriumpayoffs (e1, ..., en)
Let (ax1, ..., axn) be the set of actions yielding the feasible payoff(x1, ..., xn)
Consider the standard trigger strategy for players i = (1, ...,n):
Play axi in the first stage. In the t th stage, if the outcome of all t-1preceding stages has been (ax1, ..., axn), then play axi ; otherwise play aei
Assume that all players have adopted this trigger strategy
29 / 41
Folk Theorem: Proof (II)
Since the others will play (ae1, ..., ae,i−1, ae,i+1, ..., aen) forever,once one stage’s outcome differs from (ax1, ..., axn), playing aei is abest response for player i once the outcome differs from(ax1, ..., axn)
What is the best response for player i in the first stage and anystage where the preceding outcome has been (ax1, ..., axn)?
Let ad1 be player i’s best deviation from (ax1, ..., axn) and di is thecorresponding payoff from this deviation
Hence we have the payoff relationship di ≥ xi > ei
The present value of the sequence from player di is
di + δei + δ2ei + ... = di +δei
1− δ30 / 41
Folk Theorem: Proof (III)
Alternatively, playing axi will yield a payoff of xi .
If playing axi is optimal, the present value of is
xi + δxi + δ2xi + ... =xi
1− δIf playing adi is optimal, the present value of is (see before)
di +δei
1− δSo, playing axi is optimal if and only if
xi1− δ
≥ di +δei
1− δor
δ ≥ di − xidi − ei
31 / 41
Folk Theorem: Proof (IV)
Since this threshold value for a best response may differ amongplayers, it is only a NE for all the players to play the triggerstrategy if and only if
δ ≥ maxidi − xidi − ei
The threshold discount factor for trigger strategy being a NE isdetermined by
(short-term) gain from deviation ⇒ higher short-term gain fromnon-cooperation → more difficult to achieve cooperation(long-term) loss from deviation ⇒ higher short-term loss fromnon-cooperation (i.e. stronger punishment) → easier to achievecooperation
32 / 41
Folk Theorem: Proof (V)
Is this Nash equilibrium also subgame perfect, hence is it a Nashequilibrium in every subgame of G(∞, δ)?Two classes of subgames:
1 subgames with all outcomes of earlier stages have been (ax1, ..., axn)2 subgames in which the outcome of at least one earlier stage differs
from (ax1, ..., axn)
If players adopt trigger strategy for first class of games, thenplayers’ strategies in a subgame are again the trigger strategy(ax1, ..., axn); we just showed that this is a NE for the game as awhole
If players adopt trigger strategy for second class of games, thenplayers’ strategies are to repeat the stage-game equilibrium(ae1, ..., aen), which is also a NE for the whole game.
The trigger-strategy Nash equilibrium of the infinitely repeatedgame is subgame perfect (if δ is sufficiently large)
33 / 41
Folk Theorem
The Folk theorem implies that any point in the area above or rightto the red lines can be achieved as the average payoff in a SPNE, ifthe discount factor is sufficiently large
1,1
0,6
6,0
5,5
M(B)
Main message: Although repeated games allow for cooperativebehaviour, they also allow for an extremely wide range ofbehavior
34 / 41
Application: Cartels
Demand is given by p = A−Q , marginal cost is constant andequal to c, where A > c
There are n firms in the market, the stage game is Cournot
Firms’ discount factor is δ ∈ (0, 1)
Task:
1 Find the critical value of the discount factor to sustain collusion iffirms use grim trigger strategies. Assume that collusive behaviorinvolves equal sharing of monopoly output and profits
2 How does the minimum discount factor depend on the number offirms?
35 / 41
Application: Efficiency wages (Shapiro & Stiglitz, 1984)
Firms induce workers to work harder by paying high wages andthreatening to fire workers caught shirking
Firms reduce their demand for labor, so some workers areemployed at high wages, but involuntary unemployment increases
Larger pool of unemployed workers → threat of firing increases
In competitive equilibrium, wages w and unemployment rate ujust induce workers not to shirk, and labor demand at w results inunemployment rate u
36 / 41
Efficiency wages: Stage game
Firm offers the worker a wage w .
If the worker rejects w , she becomes self-employed at wage w0.
In case of acceptance of w , the worker chooses either to supplyeffort (with disutility e) or to shirk (without any disutility)
Effort decision is unobserved by firm, but worker’s output(low : y = 0 or high : y > 0) not
For simplicity: In case of high effort, output is high, but if workershirks, output is low
If firm employs worker at wage w , payoff with effort are y − w forthe firm and w − e for the worker; if the worker shirks, e = 0 andif output is low, y = 0
Assume that y − e > w0, hence it is efficient for the worker to beemployed and supply effort
37 / 41
Efficiency wages: Stage game
Backward induction: worker observes wage offer w :
if w ≤ w0: rejectif w > w0: accept and set e = 0 (maximises payoff w − e!)firms will anticipate e = 0, so they set w ≤ 0worker will choose self-employment
Is there a way to offer a wage premium w > w0 in an infinitelyrepeated game? ⇒ Yes, if there is a credible threat to fire theworker in case of low output
Consider the following strategy:
Firm offers w = w∗ > w0 in the first period, and in each subsequentperiod if output is high, but offer w = 0 otherwiseWorkers accept firm offer and provide effort if w∗ ≥ w0, but shirkotherwiseTrigger strategy: play cooperatively provided that all previous playshave been cooperative, but switch forever to the SPNE of the stagegame in case of deviation.
38 / 41
Efficiency wages: Worker
Firm offers w∗ and worker accepts. If the worker provides effort,output will be high, so the firm will offer w∗ also in the next period
If it is optimal for the worker to provide effort, the present value ofworker’s payoff is:
Ve =(w∗ − e)
1− δIf the worker shirks, the present value of worker’s payoff is:
Vs = w∗ +δw0
1− δIncentive to supply effort exists if
(w∗ − e)
1− δ≥ w∗ +
δw0
1− δor
w∗ ≥ w0 +e
δ39 / 41
Efficiency wages: firm
Firm decides between w = w∗, hence inducing effort bythreatening to fire in case of low output leaving a profit of y − w∗,and w = 0, hence inducing worker to choose self-employment,leaving a profit of zero in each period
It is optimal for the firm to offer a wage w = w∗ if
y − w∗ ≥ 0
Since y ≥ w∗
y ≥ w0 +e
δ
Cooperation is a Nash equilibrium if
δ ≥ e
y − w0
40 / 41
Efficiency wages: Equilibrium
Consider that we have sequential-move stage game, where workersobserve wage offers.
Cooperation if δ is sufficiently large is the SPNE if firms setw = w∗ (hence the high-wage, high-output histories)
What is the SPNE for all other histories? Workers will neversupply effort, so firms induce them to choose self-employment bysetting w = 0 by the next stage→ permanent self-employment
If worker is ever caught shirking, w = 0 forever after; if firmdeviates from offering w = w∗, then worker set e = 0 forever after,so firms cannot afford to employ the worker
Cooperation is only a SPNE if renegotiation is not feasible
41 / 41