A Study of Computational and Human Strategies in Revelation Games 1 Noam Peled, 2 Kobi Gal, 1 Sarit...

37
A Study of Computational and Human Strategies in Revelation Games 1 Noam Peled, 2 Kobi Gal, 1 Sarit Kraus 1 Bar-Ilan university, Israel. 2 Ben-Gurion university, Israel. 1

Transcript of A Study of Computational and Human Strategies in Revelation Games 1 Noam Peled, 2 Kobi Gal, 1 Sarit...

A Study of Computational and Human Strategies in Revelation Games

1Noam Peled, 2Kobi Gal, 1Sarit Kraus1Bar-Ilan university, Israel.

2Ben-Gurion university, Israel.

1

Outline

● What is this research about? ● An agent that negotiates with people in incomplete

information settings.

● People can choose to reveal private information.

● Why is it difficult? ● Uncertainty over people’s preferences.

● People are affected by social/psychological issues.

2

Our approach

● Opponent modeling + machine learning.● Results

● Agent outperformed people.

● Learned to exploit how people reveal information.

● People increased their performance when playing with agent (as compared to other people).

3

Why is it interesting?

4

Revelation games

● Incomplete information over preferences:● Players’ types are unknown.

● Players can reaveal their type before negotiating for a finite number of rounds.

● Revelation is costless.● No discount factor.

5

Our setting

● This is the minimal setting where each of the two players can make one proposal:

6

Revelation phase

1st proposal

Counter proposal

Roadmap

● Probabilistic model of people.● The model explicity represent social features.● We built an agent that uses this model for

negotiation with people.● Extensive empirical evaluation

● over 400 subjects from different countries!

7

Colored Trails

● Open source empirical test-bed for investigating decision making.

● Family of computer board games that involve negotiation over resources.

● Easy to design new games.● Built in functionality for conducting experiments

with people.● Over 30 publications.

8

Revelation game in CT

9

Shortest path to the goal

10

Objective

● The objective is to maximize your score:● Try to get as close as possible to the goal.

● Using as few chips as possible.

● End up with as many chips as you can.

11

If “me” chooses to reveal…

12

Example proposal of ‘me’ player

● Here, the ‘me’ player is signaling aboutit’s true goal location.

13

The other participant decides  whether to accept the proposal

If it accepts the proposal, the game

will end

If it rejects, it will be able to make a counter proposal

14

The end of the game

The ‘me’ player was moved to its goal using one gray and one red

chips

The other participant was moved to his goal

using 2 cyan chips

15

Agent design

● SIGAL: SIGmoid Acceptance Learning.● Models people’s behavior using probability

distributions:

● Making/accepting offers.● Whether people reveal their goals.

● Maximizes its expected benefit given the model using backward induction.

16

SIGAL strategy: Round 2

● As a responder: accepts any beneficial proposal in terms of score in the game.

● As a proposer:

● Calculates its expected benefit from each proposal:

● Choose offer that maximizes expected benefit.

),|()|(),|()|(),|( 1111 hrejectpthacceptptthE

17

Revelation phase

1st proposal

Counter proposal

SIGAL strategy: Round 1

● As responder: accept any proposal that gives it more than the expected benefit from round 2.

● As a proposer:

● Estimate its benefit from the other player counter proposal.

● Calculates the expected benefit from each proposal.● Choose the offer that maximizes expected benefit.

18

Revelation phase

1st proposal

Counter proposal

Maximum expected benefit

19

Revelation phase

● Use decision theory: SIGAL Calculates its expected benefit for both scenarios – revealing or not revealing.

● Picks the best option.

20

Revelation phase

1st proposal

Counter proposal

Modeling acceptance of offers

● Use logistic function (Camerer 2001)

Acceptance probability xue

xacceptp 1

1)|(

Social utility function 21

Social utility function

● People are not fully rational – a proposal is not desirable only for its benefit.

● Weighted sum of social features.

22

Benefit feature

● Benefit = proposed offer score – no agreement score

● As example, the score of the ‘me’ player from the demo game was 135.

● Without reaching and agreement, its score is 30● Its benefit from the proposal is 135-30=105

23

Other social features

● Difference in benefit = proposer’s benefit from the offer – responder’s benefit from the offer

● Revelation decision.● Previous round:

● The first proposal, if rejected, may affect the probability to accept the counter-proposal.

24

Learning

● We used a genetic algorithm to find the optimal weights for people’s social utility function.

● Use density estimation to learn how people make offers

● Cross validation (10-fold).● Over-fitting removal: Stop learning in the

minimum of the generalization error.● Error calculation on held out test set.

25

Fit to data

● The percentages are averages over similarutility ranges (bins)

26

What did SIGAL learn?

● Which proposal to give in each round.● Whether to accept proposals or not in each

round.● Whether to reveal its goal or not.

27

Empirical methodology

● Israeli CS students and users over the web. ● Subjects received an identical tutorial on

revelation, and had to pass a test.● Each participant played two different boards● Compared SIGAL performance with people’s

performance playing other people.

28

Performance

29

Why did SIGAL succeed?

● Learned to ask for more from people when they reveal their goal.

● Learned to make ‘fair’ proposals:

● People dislike proposals with high benefits difference in favor to the proposer.

● Learned to exploit generous people:

● If people propose a generous offer in the first round, they are more willing to accept the counter offer.

30

People also benefit from SIGAL!

● People playing with SIGAL got much more than people playing with people!

31

Web users: Amazon Turk

● Web based bulletin board for ‘Human Intelligence Tasks‘.

● Millions of ‘workers’ are exposed to your task.● We got 140 ‘qualified’ participants in 8 hours!

32

Baseline equilibrium agent

● There are lots of equilibria in the game.● We’ve developed an agent based on a pure

equilibrium strategy.

33

Equilibrium agent didn't work

34

Related work

● Did not take a decision theoretic approach.

● Repeated negotiation (Oshrat et al. 2009).● Bayesian techniques (Hindriks et al. 2008).● Approximation heuristics (Jonker et al. 2007).

● Did not evaluate a computer agent

● Repeated one-shot take-it-or-leave-it games (Gal et al. 2007).

35

Conclusions

● Revelation games are a new setting to study how people reveal information in a negotiation.

● Using a simple model, an agent can learn to outperform people in revelation games.

● Behavioral studies can actually help agent design.

● Combining decision theory with learning is a good approach for agent-design.

36

Future work

● Extend the argumentation framework.● More signaling and revelation possibilities.● Develop a model which predict in which extant

the private information should be revealed during the game.

● EEG: Does features in brain waves can improve the prediction model?

37