Rationality and information in games Jürgen Jost TexPoint fonts used in EMF. Read the TexPoint...

Post on 15-Jan-2016

228 views 0 download

Tags:

Transcript of Rationality and information in games Jürgen Jost TexPoint fonts used in EMF. Read the TexPoint...

Rationality and information in games

Jürgen Jost

Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany

Collaborators

• Nils Bertschinger, Eckehard Olbrich

(MPI MIS)

• David Wolpert (NASA, Ames)

• Mike Harre (U.Sidney) (computer plots)

• Game theory assumes that players know each others’ utility functions and that they can anticipate the actions of their opponents to the extent dictated by their rational intention of utility maximization.

• In a mixed Nash equilibrium, however, rationality does not determine a specific action, but only probabilities of actions. Therefore, even if such a game is repeated, opponents’ actions cannot be completely predicted.

Information

• Increase: Additional knowledge about actually chosen actions of opponents at mixed equilibria

• Decrease: Uncertainty about opponents’ utilities

Rationality

Increase of knowledge about actions of opponents at mixed equilibria

deviations of their probabilities from NE ones

decrease of their rationality

Decrease of knowledge by uncertainty about opponents’ utilities

less predictable actions of opponents

decrease of their rationality

Rationality

Increase of knowledge about actions of opponents at mixed equilibria

deviations of their probabilities from NE onesdecrease of their rationality

Decrease of knowledge by uncertainty about opponents’ utilities

less predictable actions of opponents decrease of their rationality (Probability distribution of actions might depend on

utilities so that very bad mistakes (in terms of pay-offs) are less probable than milder ones ( ! mathematical psychology: ``probabilistic choice'‘)

Rationality

Increase of knowledge about actions of opponents at mixed equilibria

deviations of their probabilities from NE onesdecrease of their rationality! typically, should be disadvantageous for them, but can be

clever in certain games

Decrease of knowledge by uncertainty about opponents’ utilities

less predictable actions of opponents decrease of their rationality! typically, should be advantageous for them, but can be

harmful in certain games

Alternative interpretation

• Opponents randomly selected from a population whose members exhibit some degree of variation, with player only knowing probability distribution of population ( ! econometrics), but not characteristics of individual opponents

(each opponent is rational, but their utility functions are not known precisely)

Questions

• Can the effects of varying information and rationality be quantified?

• How to model the relevant parameters?• Can this be used for purposes of control, that is,

can desired effects like Pareto optimality be achieved by tuning those parameters?

• If so, should the players themselves decide about those parameter values, or rather some external “well-meaning” controller?

Quantal response equilibria (QREs)(McKelvey – Palfrey)

Player i with utility function

Ui (x®i ;x°¡ i ) (1)

when i ("she") plays x®i and ¡ i ("he") plays x°¡ i .

p®i := probability that player i plays®: (2)

At a quantal responseequilibrium (QRE),

p®i =1Ziexp(¯ i

X

°

Ui (x®i ;x°¡ i )p

°¡ i ) (3)

where0 · ¯ i · 1 indicates i's degreeof rationality, and

Zi :=X

±

exp(¯ iX

°

Ui (x±i ;x°¡ i )p

°¡ i ): (4)

• The rationality coefficients are the only parameters, and i may have access only to her own ¯i.

Expresses both the direct effect of a variation of ¯i on the utility of i and the indirect effect of the response of –i.

p®i =1Ziexp(¯ i

X

°

Ui (x®i ;x°¡ i )p

°¡ i )

dp®id̄ i

=p®iX

°

(¯ idp°¡ id̄ i

+p°¡ i )(Ui (x®i ;x

°¡ i ) ¡

X

±

Ui (x±i ;x°¡ i )p

±i ): (1)

dp°¡ id̄ i

=p°¡ iX

®

¯¡ idp®id̄ i

(U¡ i (x°¡ i ;x

®i ) ¡

X

´

U¡ i (x´¡ i ;x

®i )p

´¡ i ):

i

TheQRE probabilities of theplayers vary smoothly and uniquely, except pos-sibly at bifurcation points.Bifurcation condition when both players haveonly twomoves, + and ¡ :

1= ¯ i¯¡ ip+i p¡i p

+¡ ip

¡¡ i (Ui (+;+) ¡ Ui (¡ ;+) ¡ Ui (+;¡ ) +Ui (¡ ;¡ ))

(U¡ i (+;+) ¡ U¡ i (¡ ;+) ¡ U¡ i (+;¡ ) +U¡ i (¡ ;¡ ))

wherep+i ;p¡i =1¡ p

+i are theQRE probabilities of i.

For ¯i 0, player becomes completely irrational and selects all possible actions with equal probability. For

¯i 1, she becomes fully rational. The QREs then converge to Nash equilibria.

For ¯i 0, player becomes completely irrational and selects all possible actions with equal probability. For

¯i 1, she becomes fully rational. The QREs then converge to Nash equilibria.

Generically, only saddle-node bifurcations, and there exists a unique path in parameter space from the fully irrational behavior to one specific Nash equilibrium. When the parameter varies in only one direction, there may be hysteresis effects and discontinuous jumps.

For ¯i 0, player becomes completely irrational and selects all possible actions with equal probability. For

¯i 1, she becomes fully rational. The QREs then converge to Nash equilibria.

Generically, only saddle-node bifurcations, and there exists a unique path in parameter space from the fully irrational behavior to one specific Nash equilibrium. When the parameter varies in only one direction, there may be hysteresis effects and discontinuous jumps.

When both rationality coefficients, ¯i and ¯-i , can be varied, we also see pitchfork bifurcations, and the players can end up at different Nash equilibria, depending on which branch they choose.

An example

2j1 0j00j0 1j2 (1)

First moveof each player(up for row player i and left for column player ¡ i)denoted by +,second one (down for i and right for ¡ i) by ¡ .

Symmetric: invariant when interchanging players and move labels;extends to QRE situation when also exchangingrationality coe±cients.

p-i

+

1

1/2

0 p i+

0 ½ 1

2j1 0j00j0 1j2 (1)

Varying ¯i equivalent to varying utility

! interpretation as tax rate

p®i =1Ziexp(¯ i

X

°

Ui (x®i ;x°¡ i )p

°¡ i )

=1Ziexp¯ iE (Ui jx®i ): (1)

Ui

How can the ¯ s be varied?1) Each player sets her value independently, without

considering the reactions of her opponents (“Anarchy”)2) The players play a Nash game for selecting their

values within some given range (“Market”)3) An external controller sets the values, e.g. as tax rates

(“Socialism”)In general, the outcomes of the three mechanisms will be

different, and which one will achieve the best results in the Pareto sense may depend on the particular game.

p®i =1Ziexp(¯ i

X

°

Ui (x®i ;x°¡ i )p

°¡ i )

=1Ziexp¯ iE (Ui jx®i ): (1)

1j1 2j11j2 0j0 (1)

i prefers ¯rst move, +, except when ¡ i plays +,in which case she is indi®erent.¡ i prefers +, except if i plays +, in which casehe is indi®erent.Sinceboth of themprefer their ¯rst move,there is a tendency to end up at (1;1)even though this is not Pareto optimal.

QREs satisfy12< p+i <1;

12< p+¡ i <1: (2)

Possible limits for ¯ i ;¯¡ i ! 1 constrained to that region.Nash equilibria +;¡ and ¡ ;+ not limits of QREs.In symmetric situation ¯ i = ¯¡ i ! 1 , limit is +;+:not strict as either player could increasetheother's pay-o®while keeping her/ his own.

Information about opponent10j ¡ 2 0j29j1 1j ¡ 1 (1)

Without mutual information, Nash equilibriump+i =13;p

+¡ i =

12

Expected pay-o®for i is 5, theone for ¡ i 0.

When i knowswhich move ¡ i plays,and ¡ i knows that i knows that,¡ i will always play his second move.His pay-o®is then ¡ 1while theoneof i is 1.Pay-o®of player provided with additional informationabout other'smove ismoredrastically reducedthan that of theother onewho only knows that his opponent has that information,but without reciprocal information about her move.

Information about opponent

• Analysis analogous to QRE possible when information about opponent is only probabilistic, that is, player receives certain symbol that carries information about opponent move probabilities. This player can then choose the probabilities of the possible reactions to the symbols received.

Information about opponent

Information of i incomplete, i.e., may get distorted by channel noise.Channel puts out symbols d2 D in response to action ° of ¡ i.i can then observe these symbols.pd° = p(dj°) = probability for symbol dwhen ¡ i chooses °.i can thus select amapping

a : D ! f®g; (1)

i.e, select l movedepending on d(Weput theprobabilities into choiceof arather than into theoperation of a.)Thus, map a chosen with a certain probability,but when chosen, reaction of i to d is determined.