Sarit Kraus Department of Computer Science Bar-Ilan University
University of Maryland [email protected] 1
http://www.cs.biu.ac.il/~sarit/
Slide 2
2 A discussion in which interested parties exchange information
and come to an agreement. Davis and Smith, 1977
Slide 3
NEGOTIATION NEGOTIATION is an interpersonal decision- making
process necessary whenever we cannot achieve our objectives
single-handedly. Negotiations 3
Slide 4
4 Teams of agents that need to coordinate joint activities;
problems: distributed information, distributed decision solving,
local conflicts. Open agent environments acting in the same
environment; problems: need motivation to cooperate, conflict
resolution, trust, distributed and hidden information.
Slide 5
5 Consist of: Automated agents developed by or serving
different people or organizations. People with a variety of
interests and institutional affiliations. The computer agents are
self-interested; they may cooperate to further their interests. The
set of agents is not fixed.
Slide 6
6 Agents support people Collaborative interfaces CSCW: Computer
Supported Cooperative Work systems Cooperative learning systems
Military-support systems Agents act as proxies for people
Coordinating schedules Patient care-delivery systems Online
auctions Groups of agents act autonomously alongside people
Simulation systems for education and training Computer games and
other forms of entertainment Robots in rescue operations Software
personal assistants
Slide 7
Agents support people Collaborative interfaces CSCW: Computer
Supported Cooperative Work systems Cooperative learning systems
Military-support systems Agents act as proxies for people
Coordinating schedules Patient care-delivery systems Online
auctions Groups of agents act autonomously alongside people
Simulation systems for education and training Computer games and
other forms of entertainment Robots in rescue operations Software
personal assistants 7
Slide 8
8 Monitoring electricity networks (Jennings) Distributed design
and engineering (Petrie et al.) Distributed meeting scheduling (Sen
& Durfee) Teams of robotic systems acting in hostile
environments (Balch & Arkin, Tambe) Collaborative
Internet-agents (Etzioni & Weld, Weiss) Collaborative
interfaces (Grosz & Ortiz, Andre) Information agent on the
Internet (Klusch) Cooperative transportation scheduling (Fischer)
Supporting hospital patient scheduling (Decker & Jin)
Intelligent Agents for Command and Control (Sycara)
Slide 9
Fully rational agents Bounded rational agents 9
Slide 10
10 No need to start from scratch! Required modification and
adjustment; AI gives insights and complimentary methods. Is it
worth it to use formal methods for multi-agent systems?
Slide 11
11 Quantitative decision making Maximizing expected utility
Nash equilibrium, Bayesian Nash equilibrium Automated Negotiator
Model the scenario as a game The agent computes (if complexity
allows) the equilibrium strategy, and acts accordingly. ( Kraus,
Strategic Negotiation in Multiagent Environments, MIT Press
2001).
Slide 12
Short introduction to game theory 12
Slide 13
13 Decision Theory = Probability theory + Utility Theory (deals
with chance) (deals with outcomes) Fundamental idea The MEU
(Maximum expected utility) principle Weigh the utility of each
outcome by the probability that it occurs
Slide 14
14 Given probability P(out 1 | A i ), utility U(out 1 ), P(out
2 | A i ), utility U(out 2 ) Expected utility of an action A i i:
EU(A i ) = U(out j )*P(out j |A i ) Choose A i such that maximizes
EU MEU = argmax U(out j )*P(out j |A i ) A i Ac Out j OUT Out j
OUT
Slide 15
15 RISK AVERSERISK NEUTRAL RISK SEEKER
Slide 16
Players Who participates in the game? Actions / Strategies What
can each player do? In what order do the players act? Outcomes /
Payoffs What is the outcome of the game? What are the players'
preferences over the possible outcomes? 16
Slide 17
Information What do the players know about the parameters of
the environment or about one another? Can they observe the actions
of the other players? Beliefs What do the players believe about the
unknown parameters of the environment or about one another? What
can they infer from observing the actions of the other players?
17
Slide 18
Strategy Complete plan, describing an action for every
contingency Nash Equilibrium Each player's strategy is a best
response to the strategies of the other players Equivalently: No
player can improve his payoffs by changing his strategy alone
Self-enforcing agreement. No need for formal contracting Other
equilibrium concepts also exist 18
Slide 19
Depending on the timing of move Games with simultaneous moves
Games with sequential moves Depending on the information available
to the players Games with perfect information Games with imperfect
(or incomplete) information We concentrate on non-cooperative games
Groups of players cannot deviate jointly Players cannot make
binding agreements 19
Slide 20
All players choose their actions simultaneously or just
independently of one another There is no private information All
aspects of the game are known to the players Representation by game
matrices Often called normal form games or strategic form games
20
Slide 21
21 Example of a zero-sum game. Strategic issue of
competition.
Slide 22
Each player can cooperate or defect cooperatedefect 0,-10 -10,0
-8,-8 -1,-1 Row Column cooperate Main issue: Tension between social
optimality and individual incentives. 22
Slide 23
A supplier and a buyer need to decide whether to adopt a new
purchasing system. newold 0,0 5,5 20,20 Supplier Buyer new 23
Slide 24
football shopping 0,0 1,2 2,1 Husband Wife football The game
involves both the issues of coordination and competition 24
Slide 25
A game has n players. Each player i has a strategy set S i This
is his possible actions Each player has a payoff function p I : S R
A strategy t i in S i is a best response if there is no other
strategy in S i that produces a higher payoff, given the opponents
strategies 25
Slide 26
A strategy profile is a list (s 1, s 2, , s n ) of the
strategies each player is using If each strategy is a best response
given the other strategies in the profile, the profile is a Nash
equilibrium Why is this important? If we assume players are
rational, they will play Nash strategies Even less-than-rational
play will often converge to Nash in repeated settings 26
Slide 27
ab b 2,1 0,1 1,0 1,2 Row Column a (b,a) is a Nash equilibrium:
Given that column is playing a, rows best response is b Given that
row is playing b, columns best response is a 27
Slide 28
Unfortunately, not every game has a pure strategy equilibrium.
Rock-paper-scissors However, every game has a mixed strategy Nash
equilibrium Each action is assigned a probability of play Player is
indifferent between actions, given these probabilities 28
Slide 29
football shopping 0,0 1,2 2,1 Husband Wife football 29
Slide 30
Instead, each player selects a probability associated with each
action Goal: utility of each action is equal Players are
indifferent to choices at this probability a=probability husband
chooses football b=probability wife chooses shopping Since payoffs
must be equal, for husband: b*1=(1-b)*2 b=2/3 For wife: a*1=(1-a)*2
= 2/3 In each case, expected payoff is 2/3 2/9 of time go to
football, 2/9 shopping, 5/9 miscoordinate If they could synchronize
ahead of time they could do better. 30
Player 1 plays rock with probability p r, scissors with
probability p s, paper with probability 1-p r p s Utility 2 (rock)
= 0*p r + 1*p s 1(1-p r p s ) = 2 p s + p r -1 Utility 2 (scissors)
= 0*p s + 1*(1 p r p s ) 1p r = 1 2p r p s Utility 2 (paper) =
0*(1-p r p s )+ 1*p r 1p s = p r p s Player 2 wants to choose a
probability for each action so that the expected payoff for each
action is the same. 32
Slide 33
q r (2 p s + p r 1) = q s (1 2p r p s ) = (1-q r -q s ) (p r p
s ) It turns out (after some algebra) that the optimal mixed
strategy is to play each action 1/3 of the time Intuition: What if
you played rock half the time? Your opponent would then play paper
half the time, and youd lose more often than you won So youd
decrease the fraction of times you played rock, until your opponent
had no edge in guessing what youll do 33
Slide 34
34 H H H T T T (1,2) (4,0) (2,1) Any finite game of perfect
information has a pure strategy Nash equilibrium. It can be found
by backward induction. Chess is a finite game of perfect
information. Therefore it is a trivial game from a game theoretic
point of view.
Slide 35
35 A game can have complex temporal structure Information set
of players who moves when and under what circumstances what actions
are available when called upon to move what is known when called
upon to move what payoffs each player receives Foundation is a game
tree
Slide 36
36 Khrushchev Kennedy Arm Retract Fold Nuke -1, 1 - 100, - 100
10, -10 Pure strategy Nash equilibria: (Arm, Fold) and (Retract,
Nuke)
Slide 37
37 Proper subgame = subtree (of the game tree) whose root is
alone in its information set Subgame perfect equilibrium Strategy
profile that is in Nash equilibrium in every proper subgame
(including the root), whether or not that subgame is reached along
the equilibrium path of play
Slide 38
38 Khrushchev Kennedy Arm Retract Fold Nuke -1, 1 - 100, - 100
10, -10 Pure strategy Nash equilibria: (Arm, Fold) and (Retract,
Nuke) Pure strategy subgame perfect equilibria: (Arm, Fold)
Conclusion: Kennedys Nuke threat was not credible.
Slide 39
39 Diplomacy
Slide 40
40 The rules of the game: 1.You will be randomly paired up with
someone in the other section; this pairing will remain completely
anonymous. 2.One of you will be chosen (by coin flip) to be either
the Proposer or the Responder in this experiment. 3.The Proposer
gets to make an offer to split $100 in some proportion with the
Responder. So the proposer can offer $x to the responder, proposing
to keep $100-x for themselves. 4.The Responder must decide what is
the lowest amount offered by the proposer that he / she will
accept; i.e. I will accept any offer which is greater than or equal
to $y. 5.If the responder accepts the offer made by the proposer,
they split the sum according to the proposal. If the responder
rejects, both parties lose their shares.
Slide 41
41
Slide 42
ZOPA x final price s b Sellers RP Sellers wants s or more
Buyers RP Buyer wants b or less Sellers surplus Buyers surplus
42
Slide 43
negative bargaining zone If b < s negative bargaining zone,
no possible agreements positive bargaining zone, If b > s
positive bargaining zone, agreement possible (x-s) sellers surplus;
(b-x) buyers surplus; The surplus to divide independent on x
constant-sum game! 43
Slide 44
Buyers target point Buyers reservation point Sellers
reservation point Sellers target point Sellers bargaining range
Buyers bargaining range POSITIVE bargaining zone 44
Slide 45
NEGATIVE BARGAINING ZONE Buyers target point Buyers reservation
point Sellers reservation point Sellers target point Sellers
bargaining range Buyers bargaining range NEGATIVE bargaining zone
45
Slide 46
Agents a and b negotiate over a pie of size 1 Offer: (x,y),
x+y=1 Deadline: n and Discount factor: Utility: Ua((x,y), t) = x
t-1 if t n Ub((x,y),t)= y t-1 0 otherwise The agents negotiate
using Rubinsteins alternating offers protocol 46
Slide 47
Time Offer Respond 1 a (x1,y1) b (accept/reject) 2 b (x2,y2) a
(accept/reject) - n 47
Slide 48
How much should an agent offer if there is only one time
period? Let n=1 and a be the first mover Equilibrium strategies
Agent as offer: Propose to keep the whole pie (1,0); agent b will
accept this 48
Slide 49
= 1/4 first mover: a Offer: (x, y) x: as share; y: bs share
Optimal offers obtained using backward induction TimeOffering
agentOfferUtility 1a b(3/4, 1/4)3/4;1/4 2b a(0, 1)0;1/4 The offer
(3/4, 1/4) forms a P.E. Nash equilibrium Agreement 49
Slide 50
What happens to first movers share as increases? What happens
to second movers share as increases? As deadline increases, what
happens to first movers share? Likewise for second mover? 50
Slide 51
Effect of and deadline on the agents shares 51
Slide 52
Set of issues: S = {1, 2, , m}. Each issue is a pie of size 1
The issues are divisible Deadline: n (for all the issues) Discount
factor: c for issue c Utility: U(x, t) = c U(x c, t) 52
Slide 53
Package deal procedure: The issues are bundled and discussed
together as a package Simultaneous procedure: The issues are
negotiated in parallel but independently of each other Sequential
procedure: The issues are negotiated sequentially one after another
53
Slide 54
Package deal procedure Issues negotiated using alternating
offers protocol An offer specifies a division for each of the m
issues The agents are allowed to accept/reject a complete offer The
agents may have different preferences over the issues The agents
can make tradeoffs across the issues to maximize their utility this
leads to Pareto optimal outcome 54
Slide 55
55 Utility for two issues U a = 2X + YU b = X + 2Y
Slide 56
56 Making tradeoffs U b = 2 What is as utility for U b = 2
Slide 57
Example for two issues DEADLINE: n = 2 DISCOUNT FACTORS: 1 = 2
= 1/2 UTILITIES: U a = 1/2 t-1 (x 1 + 2x 2 ); U b =1/2 t-1 (2y 1 +
y 2 ) TimeOffering agent Package Offer 1a b[(1/4, 3/4); (1, 0)] OR
[(3/4, 1/4); (0, 1)] 2b a[(0, 1); (0, 1)] U b = 1.5 Agreement The
outcome is not symmetric 57
Slide 58
P.E. Nash equilibrium strategies For t = n The offering agent
takes 100 percent of all the issues The receiving agent accepts For
t < n (for agent a): OFFER [x, y] s.t. U b (y, t) = EQ UB (t+1)
If more than one such [x, y] perform trade-offs across issues to
find best offer RECEIVE [x, y] If U a (x, t) EQ UA (t+1) ACCEPT
else REJECT EQ UA (t+1) is as equilibrium utility for t+1 EQ UB
(t+1) is bs equilibrium utility for t+1 58
Slide 59
Making trade-offs divisible issues Agent as trade-off problem
at time t: TR: Find a package [x, y] to m Maximize k a c x c c=1 m
Subject to k b c y c EQ UB (t+1) 0 x c 1; 0 y c 1 c=1 This is the
fractional knapsack problem 59
Slide 60
Making trade-offs divisible issues Agent as perspective (time
t) Agent a considers the m issues in the increasing order of k a /k
b and assigns to b the maximum possible share for each of them
until bs cumulative utility equals EQ UB (t+1) 60
Slide 61
Equilibrium strategies For t = n The offering agent takes 100
percent of all the issues The receiving agent accepts For t < n
(for agent a) OFFER [x, y] s.t. U b (y, t) = EQ UB (t+1) If more
then one such [x, y] perform trade-offs across issues to find best
offer RECEIVE [x, y] If U a (x, t) EQ UA (t+1) ACCEPT else REJECT
61
Slide 62
Equilibrium solution An agreement on all the m issues occurs in
the first time period Time to compute the equilibrium offer for the
first time period is O(mn) The equilibrium solution is
Pareto-optimal (an outcome is Pareto optimal if it is impossible to
improve the utility of both agents simultaneously) The equilibrium
solution is not unique, it is not symmetric 62
Slide 63
Agent as trade-off problem at time t is to find a package [x,
y] that For indivisible issues, this is the integer knapsack
problem 63
Slide 64
Single issue: Time to compute equilibrium is O(n) The
equilibrium is not unique, it is not symmetric Multiple divisible
issues: (exact solution) Time to compute equilibrium for t=1 is
O(mn) The equilibrium is Pareto optimal, it is not unique, it is
not symmetric Multiple indivisible issues: (approx. solution) There
is an FPTAS to compute approximate equilibrium The equilibrium is
Pareto optimal, it is not unique, it is not symmetric 64
Slide 65
65
Slide 66
66 The Data and Information System component of the Earth
Observing System (EOSDIS) of NASA is a distributed knowledge system
which supports archival and distribution of data at multiple and
independent servers.
Slide 67
67 Each data collection, or file, is called a dataset. The
datasets are huge, so each dataset has only one copy. The current
policy for data allocation in NASA is static: old datasets are not
reallocated; each new dataset is located by the server with the
nearest topics (defined according to the topics of the datasets
stored by this server).
Slide 68
68 The original problem: How to distribute files among
computers, in order to optimize the system performance. Our
problem: How can self-motivated servers decide about distribution
of files, when each server has its own objectives.
Slide 69
69 There are several information servers. Each server is
located at a different geographical area. Each server receives
queries from the clients in its area, and sends documents as
responses to queries. These documents can be stored locally, or in
another server.
Slide 70
70 server i server j a query document/s area i area j distance
a client the document/s the query
Slide 71
71 SERVERS: the set of the servers. DATASETS: the set of
datasets (files) to be allocated. Allocation: a mapping of each
dataset to one of the servers. The set of all possible allocation
is denoted by Allocs. U: the utility function of each server.
Slide 72
72 If at least one server opts out of the negotiation, then the
conflict allocation conflict_alloc is implemented. We consider the
conflict allocation to be the static allocation. (each dataset is
stored in the server with closest topics).
Slide 73
73 U server (alloc,t) specifies the utility of server from
alloc Allocs at time t. It consists of The utility from the
assignment of each dataset. The cost of negotiation delay. U server
(alloc,0)= V server (x,alloc(x)). x DATASETS
Slide 74
74 query price: payment for retrieved docoments. usage(ds,s):
the expected number of documents of dataset ds from clients in the
area of server s. storage costs, retrieve costs, answer costs.
Slide 75
75 Cost of communication and computation time of the
negotiation. Loss of unused information: new documents can not be
used until the negotiation ends. Datasets usage and storage cost
are assumed to decrease over time, with the same discount ratio
(p-1). Thus, there is a constant discount ratio of the utility from
an allocation: U server (alloc,t)= t *U server (alloc,0) -
t*C.
Slide 76
76 Each server prefers any agreement over continuation of the
negotiation indefinitely. The utility of each server from the
conflict allocation is always greater or equal to 0. OFFERS - the
set of allocations that are preferred by all the agents over opting
out.
Slide 77
77 Simultaneous responses: A server, when responding, is not
informed of the other responses. Theorem: For each offer x OFFERS,
there is a subgame- perfect equilibrium of the bargaining game,
with the outcome x offered and unanimously accepted in period
0.
Slide 78
78 The designers of the servers can agree in advance on a joint
technique for choosing x giving each server its conflict utility
maximizing a social welfare criterion the sum of the servers
utilities. or the generalized Nash product of the servers
utilities: (Us(x)-Us(conflict))
Slide 79
79 How do the parameters influence the results of the
negotiation? vcost(alloc): the variable costs due to an allocation
(excludes storage_cost and the gains due to queries). vcost_ratio:
the ratio of vcosts when using negotiation, and vcosts of the
static allocation.
Slide 80
80 As the number of servers grows, vcost_ratio increases (more
complex computations) . As the number of datasets grows,
vcost_ratio decreases (negotiation is more beneficial). Changing
the mean usage did not influence vcost_ratio significantly , but
vcost_ratio decreases as the standard deviation of the usage
increases.
Slide 81
81 When the standard deviation of the distances between servers
increases, vcost_ratio decreases. When the distance between servers
increases, vcost_ratio decreases. In the domains tested,
answer_cost vcost_ratio . storage_cost vcost_ratio . retrieve_cost
vcost_ratio . query_price vcost_ratio .
Slide 82
82 Each server knows: The usage frequency of all datasets, by
clients from its area The usage frequency of datasets stored in it,
by all clients
Slide 83
BARGAINING ZOPA x final price sL bL Sellers RP Sellers wants s
or more Buyers RP Buyer wants b or less Sellers surplus Buyers
surplus sH bH 83
Slide 84
N is the set of players. is the set of the states of nature. A
i is the set of actions for player i. A = A 1 A 2 A n T i is the
type set of player i. For each state of nature, the game will have
different types of players (one type per player). u: A R is the
payoff function for player i. p i is the probability distribution
over for each player i, that is to say, each player has different
views of the probability distribution over the states of the
nature. In the game, they never know the exact state of the nature.
84
Slide 85
A (Bayesian) Nash equilibrium is a strategy profile and beliefs
specified for each player about the types of the other players that
maximizes the expected utility for each player given their beliefs
about the other players' types and given the strategies played by
the other players. 85
Slide 86
86 A revelation mechanism: First, all the servers report
simultaneously all their private information: for each dataset, the
past usage of the dataset by this server. for each server, the past
usage of each local dataset by this server. Then, the negotiation
proceeds as in the complete information case.
Slide 87
87 Lemma: There is a Nash equilibrium where each server tells
the truth about its past usage of remote datasets, and the other
servers usage of its local datasets. Lies concerning details about
local usage of local datasets are intractable.
Slide 88
88 We have considered the data allocation problem in a
distributed environment. We have presented the utility function of
the servers, which expresses their preferences. We have proposed
using a negotiation protocol for solving the problem. For
incomplete information situations, a revelation process was added
to the protocol.
Slide 89
89
Slide 90
Computer persuades human Computer has the control Human has the
control 90
Slide 91
91
Slide 92
The development of standardized agent to be used in the
collection of data for studies on culture and negotiation
Buyer/Seller agents negotiate well across cultures PURB agent
92
Slide 93
93
Slide 94
94 Gertner Institute for Epidemiology and Health Policy
Research 94
Slide 95
95 I will be too tired in the afternoon!!! I scheduled an
appointment for you at the physiotherapist this afternoon Try to
reschedule and fail The physiotherapist has no other available
appointments this week. How about resting before the
appointment?
Slide 96
96 Collect Update Analyze Prioritize
Slide 97
Irrationalities attributed to sensitivity to context lack of
knowledge of own preferences the effects of complexity the
interplay between emotion and cognition the problem of self control
bounded rationality in the bullet 97
Slide 98
Agents that play repeatedly with the same person 98
Slide 99
Buyers and sellers Using data from previous experiments Belief
function to model opponent Implemented several tactics and
heuristics including, concession mechanism A. Byde, M. Yearworth,
K.-Y. Chen, and C. Bartolini. AutONA: A system for automated
multiple 1-1 negotiation. In CEC, pages 5967, 2003
Slide 100
Virtual learning and reinforcement learning Using data from
previous interactions Implemented several tactics and heuristics
qualitative in nature Non-deterministic behavior, via means of
randomization R. Katz and S. Kraus. Efficient agents for cliff edge
environments with a large set of decision options. In AAMAS, pages
697704, 2006
Slide 101
Agents that play with the same person only once 101
Slide 102
Small number of examples difficult to collect data on people
Noisy data people are inconsistent (the same person may act
differently) people are diverse 102
Slide 103
Multi-issue, multi-attribute, with incomplete information
Domain independent Implemented several tactics and heuristics
including, concession mechanism C. M. Jonker, V. Robu, and J.
Treur. An agent architecture for multi-attribute negotiation using
incomplete preference information. JAAMAS, 15(2):221252, 2007
Slide 104
Building blocks: Personality model, Utility function, Rules for
guiding choice. Key idea: Models Personality traits of its
negotiation partners over time. Uses decision theory to decide how
to negotiate, with utility function that depends on models and
other environmental features. Pre-defined rules facilitate
computation. Plays as well as people; adapts to culture
Slide 105
Multi-issue, multi-attribute, with incomplete information
Domain independent Implemented several tactics and heuristics
qualitative in nature Non-deterministic behavior, also via means of
randomization R. Lin, S. Kraus, J. Wilkenfeld, and J. Barry.
Negotiating with bounded rational agents in environments with
incomplete information using an automated agent. Artificial
Intelligence, 172(6-7):823 851, 2008 Played at least as well as
people Is it possible to improve the QOAgent? Yes, if you have data
105
Slide 106
Y. Oshrat, R. Lin, and S. Kraus. Facing the challenge of
human-agent negotiations via effective general opponent modeling.
In AAMAS, 2009 Multi-issue, multi-attribute, with incomplete
information Domain independent Implemented several tactics and
heuristics qualitative in nature Non-deterministic behavior, also
via means of randomization Using data from previous interactions
106
Slide 107
Example scenario Employer and job candidate Objective: reach an
agreement over hiring terms after successful interview 107
Slide 108
Challenge: sparse data of past negotiation sessions of people
negotiation Technique: Kernel Density Estimation 108
Slide 109
Estimate likelihood of other party: accept an offer make an
offer its expected average utility The estimation is done
separately for each possible agent type: The type of a negotiator
is determined using a simple Bayes' classifier Use estimation for
decision making General opponent modeling 109
Slide 110
KBAgent as the job candidate Best result: 20,000, Project
manager, With leased car; 20% pension funds, fast promotion, 8
hours 20,000 Team Manager With leased car Pension: 20% Slow
promotion 9 hours 12,000 Programmer Without leased car Pension: 10%
Fast promotion 10 hours 20,000 Project manager Without leased car
Pension: 20% Slow promotion 9 hours KBAgent Human 110
Slide 111
KBAgent as the job candidate Best agreement: 20,000, Project
manager, With leased car; 20% pension funds, fast promotion, 8
hours KBAgent Human 20,000 Programmer With leased car Pension: 10%
Slow promotion 9 hours Round 7 12,000 Programmer Without leased car
Pension: 10% Fast promotion 10 hours 20,000 Team Manager With
leased car Pension: 20% Slow promotion 9 hours 111
Slide 112
112 Experiments 172 grad and undergrad students in Computer
Science People were told they may be playing a computer agent or a
person. Scenarios: Employer-Employee Tobacco Convention: England
vs. Zimbabwe Learned from 20 games of human-human 112
Slide 113
113 Results: Comparing KBAgent to others Player TypeAverage
Utility Value (std) KBAgent vs people Employer 468.9 (37.0) QOAgent
vs peoples417.4 (135.9) People vs. People408.9 (106.7) People vs.
QOAgent431.8 (80.8) People vs. KBAgent380. 4 (48.5) KBAgent482.7
(57.5) QOAgent Job Candidate 397.8 (86.0) People vs. People310.3
(143.6) People vs. QOAgent320.5 (112.7) People vs. KBAgent370.5
(58.9) 113
Slide 114
114 Main results In comparison to the QOAgent The KBAgent
achieved higher utility values than QOAgent More agreements were
accepted by people The sum of utility values (social welfare) were
higher when the KBAgent was involved The KBAgent achieved
significantly higher utility values than people Results demonstrate
the proficiency negotiation done by the KBAgent General opponent
modeling improves agent negotiations General opponent* modeling
improves agent bargaining
Slide 115
115 I will be too tired in the afternoon!!! I arrange for you
to go to the physiotherapist in the afternoon How can I convince
him? What argument should I give?
Slide 116
116 How should I convince him to provide me with
information?
Slide 117
Which information to reveal? 117 Should I tell him that I will
lose a project if I dont hire today? Should I tell him I was fired
from my last job? Should I tell her that my leg hurts? Should I
tell him that we are running out of antibiotics? Build a game that
combines information revelation and bargaining 117
Slide 118
118 I will be too tired in the afternoon!!! I arrange for you
to go to the physiotherapist in the afternoon How can I convince
him? What argument should I give?
Slide 119
119 How should I convince him to provide me with
information?
Slide 120
An infrastructure for agent design, implementation and
evaluation for open environments Designed with Barbara Grosz (AAMAS
2004) Implemented by Harvard team and BIU team 120
Slide 121
Interesting for people to play analogous to task settings;
vivid representation of strategy space (not just a list of
outcomes). Possible for computers to play Can vary in complexity
repeated vs. one-shot setting; availability of information;
communication protocol. 121
Slide 122
Learns the extent to which people are affected by social
preferences such as social welfare and competitiveness. Designed
for one-shot take-it-or-leave-it scenarios. Does not reason about
the future ramifications of its actions. Y. Gal and A. Pfeffer:
Predicting people's bidding behavior in negotiation. AAMAS 2006:
370-376
Slide 123
Agents for Revelation Games Peled Noam, Gal Kobi, Kraus Sarit
123
Slide 124
124- Introduction - Revelation games Combine two types of
interaction Signaling games (Spence 1974) Players choose whether to
convey private information to each other Bargaining games (Osborne
and Rubinstein 1999) Players engage in multiple negotiation rounds
Example: Job interview
Slide 125
125- Colored Trails (CT) Asymmetric Symmetric
Slide 126
126 Results from the social sciences suggest people do not
follow equilibrium strategies: Equilibrium based agents played
against people failed. People rarely design agents to follow
equilibrium strategies (Sarne et al AAMAS 2008). Equilibrium
strategies are usually not cooperative all lose. 126
Slide 127
127- Perfect Equilibrium (PE) Agent Solved using Backward
induction. No signaling. Counter-proposal round (selfish): Second
proposer: Find the most beneficial proposal while the responder
benefit remains positive. Second responder: Accepts any proposal
which gives it a positive benefit.
Slide 128
128- PE agent Phase one First proposal round (generous): First
proposer: propose the opponents counter-proposal. First responder:
Accepts any proposals which gives it the same or higher benefit
from its counter-proposal. Revelation phase - revelation vs non
revelation: In both boards, the PE with goal revelation yields
lower or equal expected utility than non-revelation PE
Slide 129
129- Benefits Diversity Average proposed benefit to players
from first and second rounds
Slide 130
130- Performance of PEQ agent
Slide 131
131- Revelation Effect Only 35% of the games played by humans
included revelation Revelation had a significant effect on human
performance but not on agent performance Revelation didn't help the
agent People were deterred by the strategic machine-generated
proposals
Slide 132
132 Agent based on general opponent modeling: Genetic algorithm
Logistic Regression
Slide 133
133- SIGAL Agent Learns from previous games. Predict the
acceptance probability for each proposal using Logistic Regression.
Models human as using a weighted utility function of: Humans
benefit Benefits difference Revelation decision Benefits in
previous round
Slide 134
134- Logistic Regression using a Genetic Algorithm
Slide 135
135- Expected benefit maximization
Slide 136
136- Maximization round 2
Slide 137
137- Strategy Comparison Strategies for the asymmetric board,
non of the players has revealed, the human lacks 2 chips for
reaching the goal, the agent lacks 1: * In first round the agent
was proposed a benefit of 90
Slide 138
138- Heuristics Tit for Tat Never give more than you asks in
the counter-proposal Risk averseness Isoelastic utility:
Slide 139
139- Learned Coefficients Responder benefit: (0.96) Benefits
difference: (-0.79) Responder revelation: (0.26) Proposer
revelation: (0.03) Responder benefit in first round: (0.45)
Proposer benefit in first round: (0.33)
Slide 140
140- Methodology Cross validation. 10-fold Over-fitting
removal. Stop learning in the minimum of the generalization error
Error calculation on held out test set. Using new human-human games
Performance prediction criteria.
Slide 141
141- Performance General opponent* modeling improves agent
negotiations
Slide 142
142
Slide 143
143 Agent based on general* opponent modeling Decision Tree/
Nave Byes AAT 143
Slide 144
Aspiration Adaptation Theory (AAT) Economic theory of peoples
behavior (Selten) No utility function exists for decisions (!)
Relative decisions used instead Retreat and urgency used for goal
variables 144 Avi Rosenfeld and Sarit Kraus. Modeling Agents
through Bounded Rationality Theories. Proc. of IJCAI 2009., JAAMAS,
2010.
Slide 145
145 1000 145
Slide 146
146 1000 900
Slide 147
147 1000 900 950 If price < 800 buy; otherwise visit 5
stores and buy in the cheapest. 147
Slide 148
148
Slide 149
General opponent* modeling in cooperative environments 149
Slide 150
Communication is not always possible: High communication costs
Need to act undetected Damaged communication devices Language
incompatibilities Goal: Limited interruption of human activities
Zuckerman, S. Kraus and J. S. Rosenschein. Using Focal Points
Learning to Improve Human-Machine Tactic Coordination, JAAMAS,
2010. 150
Slide 151
Divide 100 into two piles, if your piles are identical to your
coordination partner, you get the 100. Otherwise, you get nothing.
101 equilibria 151
Slide 152
9 equilibria 16 equilibria 152
Slide 153
Thomas Schelling (63) Focal Points = Prominent solutions to
tactic coordination games 153
Slide 154
Domain-independent rules that could be used by automated agents
to identify focal points: Properties: Centrality, Firstness,
Extremeness, Singularity. Logic based model Decision theory based
model Algorithms for agents coordination Kraus and Rosenchein MAAMA
1992 Fenster et al ICMAS 1995 Annals of Mathematics and Artificial
Intelligence 2000 154
Slide 155
155 Agent based on general* opponent modeling Decision Tree/
neural network Focal Point 155
Slide 156
156 Agent based on general opponent modeling: Decision Tree/
neural network raw data vector FP vector 156
Slide 157
3 experimental domains: 157 157
Slide 158
very similar domain (VSD) vs similar domain (SD) of the pick
the pile game. 158 General opponent* modeling improves agent
coordination 158
Slide 159
159 Experiments with people is a costly process
Slide 160
Peer Designed Agents (PDA): computer agents developed by humans
Experiment: 300 human subjects, 50 PDAs, 3 EDA Results: EDA
outperformed PDAs in the same situations in which they outperformed
people, on average, EDA exhibited the same measure of generosity
160 R. Lin, S. Kraus, Y. Oshrat and Y. Gal. Facilitating the
Evaluation of Automated Negotiators using Peer Designed Agents, in
AAAI 2010.
Slide 161
Negotiation and argumentation with people is required for many
applications General* opponent modeling is beneficial Machine
learning Behavioral model Challenge: how to integrate machine
learning and behavioral model 161
Slide 162
1. S.S. Fatima, M. Wooldridge, and N.R. Jennings, Multi-issue
negotiation with deadlines, Jnl of AI Research, 21: 381-471, 2006.
2. R. Keeney and H. Raiffa, Decisions with multiple objectives:
Preferences and value trade-offs, John Wiley, 1976. 3. S. Kraus,
Strategic negotiation in multiagent environments, The MIT press,
2001. 4. S. Kraus and D. Lehmann. Designing and Building a
Negotiating Automated Agent, Computational Intelligence,
11(1):132-171, 1995 5. S. Kraus, K. Sycara and A. Evenchik.
Reaching agreements through argumentation: a logical model and
implementation. Artificial Intelligence journal, 104(1-2):1-69,
1998. 6. R. Lin and Sarit Kraus. Can Automated Agents Proficiently
Negotiate With Humans? Communications of the ACM Vol. 53 No. 1,
Pages 78-88, January, 2010. 7. R. Lin, S. Kraus, Y. Oshrat and Y.
Gal. Facilitating the Evaluation of Automated Negotiators using
Peer Designed Agents, in AAAI 2010. 162
Slide 163
8. R. Lin, S. Kraus, J. Wilkenfeld, and J. Barry. Negotiating
with bounded rational agents in environments with incomplete
information using an automated agent. Artificial Intelligence,
172(6-7):823 851, 2008 9. A. Lomuscio, M. Wooldridge, and N.R.
Jennings, A classification scheme for negotiation in electronic
commerce, Int. Jnl. of Group Deciion and Negotiation, 12(1), 31-56,
2003. 10. M.J. Osborne and A. Rubinstein, A course in game theory,
The MIT press, 1994. 11. M.J. Osborne and A. Rubinstein, Bargaining
and Markets, Academic Press, 1990. 12. Y. Oshrat, R. Lin, and S.
Kraus. Facing the challenge of human-agent negotiations via
effective general opponent modeling. In AAMAS, 2009 13. H. Raiffa,
The Art and Science of Negotiation, Harvard University Press, 1982.
14. J.S. Rosenschein and G. Zlotkin, Rules of encounter, The MIT
press, 1994. 15. I. Stahl, Bargaining Theory, Economics Research
Institute, Stockholm School of Economics, 1972. 16. I. Zuckerman,
S. Kraus and J. S. Rosenschein. Using Focal Points Learning to
Improve Human-Machine Tactic Coordination, JAAMAS, 2010. 163
Slide 164
2 nd annual competition of state-of-the-art negotiating agents
to be held in AAMAS11 Do you want to participate? At least $2,000
for the winner! Contact us! [email protected] Tournament