Sarit Kraus Department of Computer Science Bar-Ilan University University of Maryland...

162
Automated negotiations: Agents interacting with other automated agents and with humans Sarit Kraus Department of Computer Science Bar-Ilan University University of Maryland [email protected] 1 http://www.cs.biu.ac.il/~sarit

Transcript of Sarit Kraus Department of Computer Science Bar-Ilan University University of Maryland...

  • Slide 1
  • Sarit Kraus Department of Computer Science Bar-Ilan University University of Maryland [email protected] 1 http://www.cs.biu.ac.il/~sarit/
  • Slide 2
  • 2 A discussion in which interested parties exchange information and come to an agreement. Davis and Smith, 1977
  • Slide 3
  • NEGOTIATION NEGOTIATION is an interpersonal decision- making process necessary whenever we cannot achieve our objectives single-handedly. Negotiations 3
  • Slide 4
  • 4 Teams of agents that need to coordinate joint activities; problems: distributed information, distributed decision solving, local conflicts. Open agent environments acting in the same environment; problems: need motivation to cooperate, conflict resolution, trust, distributed and hidden information.
  • Slide 5
  • 5 Consist of: Automated agents developed by or serving different people or organizations. People with a variety of interests and institutional affiliations. The computer agents are self-interested; they may cooperate to further their interests. The set of agents is not fixed.
  • Slide 6
  • 6 Agents support people Collaborative interfaces CSCW: Computer Supported Cooperative Work systems Cooperative learning systems Military-support systems Agents act as proxies for people Coordinating schedules Patient care-delivery systems Online auctions Groups of agents act autonomously alongside people Simulation systems for education and training Computer games and other forms of entertainment Robots in rescue operations Software personal assistants
  • Slide 7
  • Agents support people Collaborative interfaces CSCW: Computer Supported Cooperative Work systems Cooperative learning systems Military-support systems Agents act as proxies for people Coordinating schedules Patient care-delivery systems Online auctions Groups of agents act autonomously alongside people Simulation systems for education and training Computer games and other forms of entertainment Robots in rescue operations Software personal assistants 7
  • Slide 8
  • 8 Monitoring electricity networks (Jennings) Distributed design and engineering (Petrie et al.) Distributed meeting scheduling (Sen & Durfee) Teams of robotic systems acting in hostile environments (Balch & Arkin, Tambe) Collaborative Internet-agents (Etzioni & Weld, Weiss) Collaborative interfaces (Grosz & Ortiz, Andre) Information agent on the Internet (Klusch) Cooperative transportation scheduling (Fischer) Supporting hospital patient scheduling (Decker & Jin) Intelligent Agents for Command and Control (Sycara)
  • Slide 9
  • Fully rational agents Bounded rational agents 9
  • Slide 10
  • 10 No need to start from scratch! Required modification and adjustment; AI gives insights and complimentary methods. Is it worth it to use formal methods for multi-agent systems?
  • Slide 11
  • 11 Quantitative decision making Maximizing expected utility Nash equilibrium, Bayesian Nash equilibrium Automated Negotiator Model the scenario as a game The agent computes (if complexity allows) the equilibrium strategy, and acts accordingly. ( Kraus, Strategic Negotiation in Multiagent Environments, MIT Press 2001).
  • Slide 12
  • Short introduction to game theory 12
  • Slide 13
  • 13 Decision Theory = Probability theory + Utility Theory (deals with chance) (deals with outcomes) Fundamental idea The MEU (Maximum expected utility) principle Weigh the utility of each outcome by the probability that it occurs
  • Slide 14
  • 14 Given probability P(out 1 | A i ), utility U(out 1 ), P(out 2 | A i ), utility U(out 2 ) Expected utility of an action A i i: EU(A i ) = U(out j )*P(out j |A i ) Choose A i such that maximizes EU MEU = argmax U(out j )*P(out j |A i ) A i Ac Out j OUT Out j OUT
  • Slide 15
  • 15 RISK AVERSERISK NEUTRAL RISK SEEKER
  • Slide 16
  • Players Who participates in the game? Actions / Strategies What can each player do? In what order do the players act? Outcomes / Payoffs What is the outcome of the game? What are the players' preferences over the possible outcomes? 16
  • Slide 17
  • Information What do the players know about the parameters of the environment or about one another? Can they observe the actions of the other players? Beliefs What do the players believe about the unknown parameters of the environment or about one another? What can they infer from observing the actions of the other players? 17
  • Slide 18
  • Strategy Complete plan, describing an action for every contingency Nash Equilibrium Each player's strategy is a best response to the strategies of the other players Equivalently: No player can improve his payoffs by changing his strategy alone Self-enforcing agreement. No need for formal contracting Other equilibrium concepts also exist 18
  • Slide 19
  • Depending on the timing of move Games with simultaneous moves Games with sequential moves Depending on the information available to the players Games with perfect information Games with imperfect (or incomplete) information We concentrate on non-cooperative games Groups of players cannot deviate jointly Players cannot make binding agreements 19
  • Slide 20
  • All players choose their actions simultaneously or just independently of one another There is no private information All aspects of the game are known to the players Representation by game matrices Often called normal form games or strategic form games 20
  • Slide 21
  • 21 Example of a zero-sum game. Strategic issue of competition.
  • Slide 22
  • Each player can cooperate or defect cooperatedefect 0,-10 -10,0 -8,-8 -1,-1 Row Column cooperate Main issue: Tension between social optimality and individual incentives. 22
  • Slide 23
  • A supplier and a buyer need to decide whether to adopt a new purchasing system. newold 0,0 5,5 20,20 Supplier Buyer new 23
  • Slide 24
  • football shopping 0,0 1,2 2,1 Husband Wife football The game involves both the issues of coordination and competition 24
  • Slide 25
  • A game has n players. Each player i has a strategy set S i This is his possible actions Each player has a payoff function p I : S R A strategy t i in S i is a best response if there is no other strategy in S i that produces a higher payoff, given the opponents strategies 25
  • Slide 26
  • A strategy profile is a list (s 1, s 2, , s n ) of the strategies each player is using If each strategy is a best response given the other strategies in the profile, the profile is a Nash equilibrium Why is this important? If we assume players are rational, they will play Nash strategies Even less-than-rational play will often converge to Nash in repeated settings 26
  • Slide 27
  • ab b 2,1 0,1 1,0 1,2 Row Column a (b,a) is a Nash equilibrium: Given that column is playing a, rows best response is b Given that row is playing b, columns best response is a 27
  • Slide 28
  • Unfortunately, not every game has a pure strategy equilibrium. Rock-paper-scissors However, every game has a mixed strategy Nash equilibrium Each action is assigned a probability of play Player is indifferent between actions, given these probabilities 28
  • Slide 29
  • football shopping 0,0 1,2 2,1 Husband Wife football 29
  • Slide 30
  • Instead, each player selects a probability associated with each action Goal: utility of each action is equal Players are indifferent to choices at this probability a=probability husband chooses football b=probability wife chooses shopping Since payoffs must be equal, for husband: b*1=(1-b)*2 b=2/3 For wife: a*1=(1-a)*2 = 2/3 In each case, expected payoff is 2/3 2/9 of time go to football, 2/9 shopping, 5/9 miscoordinate If they could synchronize ahead of time they could do better. 30
  • Slide 31
  • rockpaper 1,-1 -1,1 0,0 Row Column rock scissors 1,-1 -1,1 1,-10,0 31
  • Slide 32
  • Player 1 plays rock with probability p r, scissors with probability p s, paper with probability 1-p r p s Utility 2 (rock) = 0*p r + 1*p s 1(1-p r p s ) = 2 p s + p r -1 Utility 2 (scissors) = 0*p s + 1*(1 p r p s ) 1p r = 1 2p r p s Utility 2 (paper) = 0*(1-p r p s )+ 1*p r 1p s = p r p s Player 2 wants to choose a probability for each action so that the expected payoff for each action is the same. 32
  • Slide 33
  • q r (2 p s + p r 1) = q s (1 2p r p s ) = (1-q r -q s ) (p r p s ) It turns out (after some algebra) that the optimal mixed strategy is to play each action 1/3 of the time Intuition: What if you played rock half the time? Your opponent would then play paper half the time, and youd lose more often than you won So youd decrease the fraction of times you played rock, until your opponent had no edge in guessing what youll do 33
  • Slide 34
  • 34 H H H T T T (1,2) (4,0) (2,1) Any finite game of perfect information has a pure strategy Nash equilibrium. It can be found by backward induction. Chess is a finite game of perfect information. Therefore it is a trivial game from a game theoretic point of view.
  • Slide 35
  • 35 A game can have complex temporal structure Information set of players who moves when and under what circumstances what actions are available when called upon to move what is known when called upon to move what payoffs each player receives Foundation is a game tree
  • Slide 36
  • 36 Khrushchev Kennedy Arm Retract Fold Nuke -1, 1 - 100, - 100 10, -10 Pure strategy Nash equilibria: (Arm, Fold) and (Retract, Nuke)
  • Slide 37
  • 37 Proper subgame = subtree (of the game tree) whose root is alone in its information set Subgame perfect equilibrium Strategy profile that is in Nash equilibrium in every proper subgame (including the root), whether or not that subgame is reached along the equilibrium path of play
  • Slide 38
  • 38 Khrushchev Kennedy Arm Retract Fold Nuke -1, 1 - 100, - 100 10, -10 Pure strategy Nash equilibria: (Arm, Fold) and (Retract, Nuke) Pure strategy subgame perfect equilibria: (Arm, Fold) Conclusion: Kennedys Nuke threat was not credible.
  • Slide 39
  • 39 Diplomacy
  • Slide 40
  • 40 The rules of the game: 1.You will be randomly paired up with someone in the other section; this pairing will remain completely anonymous. 2.One of you will be chosen (by coin flip) to be either the Proposer or the Responder in this experiment. 3.The Proposer gets to make an offer to split $100 in some proportion with the Responder. So the proposer can offer $x to the responder, proposing to keep $100-x for themselves. 4.The Responder must decide what is the lowest amount offered by the proposer that he / she will accept; i.e. I will accept any offer which is greater than or equal to $y. 5.If the responder accepts the offer made by the proposer, they split the sum according to the proposal. If the responder rejects, both parties lose their shares.
  • Slide 41
  • 41
  • Slide 42
  • ZOPA x final price s b Sellers RP Sellers wants s or more Buyers RP Buyer wants b or less Sellers surplus Buyers surplus 42
  • Slide 43
  • negative bargaining zone If b < s negative bargaining zone, no possible agreements positive bargaining zone, If b > s positive bargaining zone, agreement possible (x-s) sellers surplus; (b-x) buyers surplus; The surplus to divide independent on x constant-sum game! 43
  • Slide 44
  • Buyers target point Buyers reservation point Sellers reservation point Sellers target point Sellers bargaining range Buyers bargaining range POSITIVE bargaining zone 44
  • Slide 45
  • NEGATIVE BARGAINING ZONE Buyers target point Buyers reservation point Sellers reservation point Sellers target point Sellers bargaining range Buyers bargaining range NEGATIVE bargaining zone 45
  • Slide 46
  • Agents a and b negotiate over a pie of size 1 Offer: (x,y), x+y=1 Deadline: n and Discount factor: Utility: Ua((x,y), t) = x t-1 if t n Ub((x,y),t)= y t-1 0 otherwise The agents negotiate using Rubinsteins alternating offers protocol 46
  • Slide 47
  • Time Offer Respond 1 a (x1,y1) b (accept/reject) 2 b (x2,y2) a (accept/reject) - n 47
  • Slide 48
  • How much should an agent offer if there is only one time period? Let n=1 and a be the first mover Equilibrium strategies Agent as offer: Propose to keep the whole pie (1,0); agent b will accept this 48
  • Slide 49
  • = 1/4 first mover: a Offer: (x, y) x: as share; y: bs share Optimal offers obtained using backward induction TimeOffering agentOfferUtility 1a b(3/4, 1/4)3/4;1/4 2b a(0, 1)0;1/4 The offer (3/4, 1/4) forms a P.E. Nash equilibrium Agreement 49
  • Slide 50
  • What happens to first movers share as increases? What happens to second movers share as increases? As deadline increases, what happens to first movers share? Likewise for second mover? 50
  • Slide 51
  • Effect of and deadline on the agents shares 51
  • Slide 52
  • Set of issues: S = {1, 2, , m}. Each issue is a pie of size 1 The issues are divisible Deadline: n (for all the issues) Discount factor: c for issue c Utility: U(x, t) = c U(x c, t) 52
  • Slide 53
  • Package deal procedure: The issues are bundled and discussed together as a package Simultaneous procedure: The issues are negotiated in parallel but independently of each other Sequential procedure: The issues are negotiated sequentially one after another 53
  • Slide 54
  • Package deal procedure Issues negotiated using alternating offers protocol An offer specifies a division for each of the m issues The agents are allowed to accept/reject a complete offer The agents may have different preferences over the issues The agents can make tradeoffs across the issues to maximize their utility this leads to Pareto optimal outcome 54
  • Slide 55
  • 55 Utility for two issues U a = 2X + YU b = X + 2Y
  • Slide 56
  • 56 Making tradeoffs U b = 2 What is as utility for U b = 2
  • Slide 57
  • Example for two issues DEADLINE: n = 2 DISCOUNT FACTORS: 1 = 2 = 1/2 UTILITIES: U a = 1/2 t-1 (x 1 + 2x 2 ); U b =1/2 t-1 (2y 1 + y 2 ) TimeOffering agent Package Offer 1a b[(1/4, 3/4); (1, 0)] OR [(3/4, 1/4); (0, 1)] 2b a[(0, 1); (0, 1)] U b = 1.5 Agreement The outcome is not symmetric 57
  • Slide 58
  • P.E. Nash equilibrium strategies For t = n The offering agent takes 100 percent of all the issues The receiving agent accepts For t < n (for agent a): OFFER [x, y] s.t. U b (y, t) = EQ UB (t+1) If more than one such [x, y] perform trade-offs across issues to find best offer RECEIVE [x, y] If U a (x, t) EQ UA (t+1) ACCEPT else REJECT EQ UA (t+1) is as equilibrium utility for t+1 EQ UB (t+1) is bs equilibrium utility for t+1 58
  • Slide 59
  • Making trade-offs divisible issues Agent as trade-off problem at time t: TR: Find a package [x, y] to m Maximize k a c x c c=1 m Subject to k b c y c EQ UB (t+1) 0 x c 1; 0 y c 1 c=1 This is the fractional knapsack problem 59
  • Slide 60
  • Making trade-offs divisible issues Agent as perspective (time t) Agent a considers the m issues in the increasing order of k a /k b and assigns to b the maximum possible share for each of them until bs cumulative utility equals EQ UB (t+1) 60
  • Slide 61
  • Equilibrium strategies For t = n The offering agent takes 100 percent of all the issues The receiving agent accepts For t < n (for agent a) OFFER [x, y] s.t. U b (y, t) = EQ UB (t+1) If more then one such [x, y] perform trade-offs across issues to find best offer RECEIVE [x, y] If U a (x, t) EQ UA (t+1) ACCEPT else REJECT 61
  • Slide 62
  • Equilibrium solution An agreement on all the m issues occurs in the first time period Time to compute the equilibrium offer for the first time period is O(mn) The equilibrium solution is Pareto-optimal (an outcome is Pareto optimal if it is impossible to improve the utility of both agents simultaneously) The equilibrium solution is not unique, it is not symmetric 62
  • Slide 63
  • Agent as trade-off problem at time t is to find a package [x, y] that For indivisible issues, this is the integer knapsack problem 63
  • Slide 64
  • Single issue: Time to compute equilibrium is O(n) The equilibrium is not unique, it is not symmetric Multiple divisible issues: (exact solution) Time to compute equilibrium for t=1 is O(mn) The equilibrium is Pareto optimal, it is not unique, it is not symmetric Multiple indivisible issues: (approx. solution) There is an FPTAS to compute approximate equilibrium The equilibrium is Pareto optimal, it is not unique, it is not symmetric 64
  • Slide 65
  • 65
  • Slide 66
  • 66 The Data and Information System component of the Earth Observing System (EOSDIS) of NASA is a distributed knowledge system which supports archival and distribution of data at multiple and independent servers.
  • Slide 67
  • 67 Each data collection, or file, is called a dataset. The datasets are huge, so each dataset has only one copy. The current policy for data allocation in NASA is static: old datasets are not reallocated; each new dataset is located by the server with the nearest topics (defined according to the topics of the datasets stored by this server).
  • Slide 68
  • 68 The original problem: How to distribute files among computers, in order to optimize the system performance. Our problem: How can self-motivated servers decide about distribution of files, when each server has its own objectives.
  • Slide 69
  • 69 There are several information servers. Each server is located at a different geographical area. Each server receives queries from the clients in its area, and sends documents as responses to queries. These documents can be stored locally, or in another server.
  • Slide 70
  • 70 server i server j a query document/s area i area j distance a client the document/s the query
  • Slide 71
  • 71 SERVERS: the set of the servers. DATASETS: the set of datasets (files) to be allocated. Allocation: a mapping of each dataset to one of the servers. The set of all possible allocation is denoted by Allocs. U: the utility function of each server.
  • Slide 72
  • 72 If at least one server opts out of the negotiation, then the conflict allocation conflict_alloc is implemented. We consider the conflict allocation to be the static allocation. (each dataset is stored in the server with closest topics).
  • Slide 73
  • 73 U server (alloc,t) specifies the utility of server from alloc Allocs at time t. It consists of The utility from the assignment of each dataset. The cost of negotiation delay. U server (alloc,0)= V server (x,alloc(x)). x DATASETS
  • Slide 74
  • 74 query price: payment for retrieved docoments. usage(ds,s): the expected number of documents of dataset ds from clients in the area of server s. storage costs, retrieve costs, answer costs.
  • Slide 75
  • 75 Cost of communication and computation time of the negotiation. Loss of unused information: new documents can not be used until the negotiation ends. Datasets usage and storage cost are assumed to decrease over time, with the same discount ratio (p-1). Thus, there is a constant discount ratio of the utility from an allocation: U server (alloc,t)= t *U server (alloc,0) - t*C.
  • Slide 76
  • 76 Each server prefers any agreement over continuation of the negotiation indefinitely. The utility of each server from the conflict allocation is always greater or equal to 0. OFFERS - the set of allocations that are preferred by all the agents over opting out.
  • Slide 77
  • 77 Simultaneous responses: A server, when responding, is not informed of the other responses. Theorem: For each offer x OFFERS, there is a subgame- perfect equilibrium of the bargaining game, with the outcome x offered and unanimously accepted in period 0.
  • Slide 78
  • 78 The designers of the servers can agree in advance on a joint technique for choosing x giving each server its conflict utility maximizing a social welfare criterion the sum of the servers utilities. or the generalized Nash product of the servers utilities: (Us(x)-Us(conflict))
  • Slide 79
  • 79 How do the parameters influence the results of the negotiation? vcost(alloc): the variable costs due to an allocation (excludes storage_cost and the gains due to queries). vcost_ratio: the ratio of vcosts when using negotiation, and vcosts of the static allocation.
  • Slide 80
  • 80 As the number of servers grows, vcost_ratio increases (more complex computations) . As the number of datasets grows, vcost_ratio decreases (negotiation is more beneficial). Changing the mean usage did not influence vcost_ratio significantly , but vcost_ratio decreases as the standard deviation of the usage increases.
  • Slide 81
  • 81 When the standard deviation of the distances between servers increases, vcost_ratio decreases. When the distance between servers increases, vcost_ratio decreases. In the domains tested, answer_cost vcost_ratio . storage_cost vcost_ratio . retrieve_cost vcost_ratio . query_price vcost_ratio .
  • Slide 82
  • 82 Each server knows: The usage frequency of all datasets, by clients from its area The usage frequency of datasets stored in it, by all clients
  • Slide 83
  • BARGAINING ZOPA x final price sL bL Sellers RP Sellers wants s or more Buyers RP Buyer wants b or less Sellers surplus Buyers surplus sH bH 83
  • Slide 84
  • N is the set of players. is the set of the states of nature. A i is the set of actions for player i. A = A 1 A 2 A n T i is the type set of player i. For each state of nature, the game will have different types of players (one type per player). u: A R is the payoff function for player i. p i is the probability distribution over for each player i, that is to say, each player has different views of the probability distribution over the states of the nature. In the game, they never know the exact state of the nature. 84
  • Slide 85
  • A (Bayesian) Nash equilibrium is a strategy profile and beliefs specified for each player about the types of the other players that maximizes the expected utility for each player given their beliefs about the other players' types and given the strategies played by the other players. 85
  • Slide 86
  • 86 A revelation mechanism: First, all the servers report simultaneously all their private information: for each dataset, the past usage of the dataset by this server. for each server, the past usage of each local dataset by this server. Then, the negotiation proceeds as in the complete information case.
  • Slide 87
  • 87 Lemma: There is a Nash equilibrium where each server tells the truth about its past usage of remote datasets, and the other servers usage of its local datasets. Lies concerning details about local usage of local datasets are intractable.
  • Slide 88
  • 88 We have considered the data allocation problem in a distributed environment. We have presented the utility function of the servers, which expresses their preferences. We have proposed using a negotiation protocol for solving the problem. For incomplete information situations, a revelation process was added to the protocol.
  • Slide 89
  • 89
  • Slide 90
  • Computer persuades human Computer has the control Human has the control 90
  • Slide 91
  • 91
  • Slide 92
  • The development of standardized agent to be used in the collection of data for studies on culture and negotiation Buyer/Seller agents negotiate well across cultures PURB agent 92
  • Slide 93
  • 93
  • Slide 94
  • 94 Gertner Institute for Epidemiology and Health Policy Research 94
  • Slide 95
  • 95 I will be too tired in the afternoon!!! I scheduled an appointment for you at the physiotherapist this afternoon Try to reschedule and fail The physiotherapist has no other available appointments this week. How about resting before the appointment?
  • Slide 96
  • 96 Collect Update Analyze Prioritize
  • Slide 97
  • Irrationalities attributed to sensitivity to context lack of knowledge of own preferences the effects of complexity the interplay between emotion and cognition the problem of self control bounded rationality in the bullet 97
  • Slide 98
  • Agents that play repeatedly with the same person 98
  • Slide 99
  • Buyers and sellers Using data from previous experiments Belief function to model opponent Implemented several tactics and heuristics including, concession mechanism A. Byde, M. Yearworth, K.-Y. Chen, and C. Bartolini. AutONA: A system for automated multiple 1-1 negotiation. In CEC, pages 5967, 2003
  • Slide 100
  • Virtual learning and reinforcement learning Using data from previous interactions Implemented several tactics and heuristics qualitative in nature Non-deterministic behavior, via means of randomization R. Katz and S. Kraus. Efficient agents for cliff edge environments with a large set of decision options. In AAMAS, pages 697704, 2006
  • Slide 101
  • Agents that play with the same person only once 101
  • Slide 102
  • Small number of examples difficult to collect data on people Noisy data people are inconsistent (the same person may act differently) people are diverse 102
  • Slide 103
  • Multi-issue, multi-attribute, with incomplete information Domain independent Implemented several tactics and heuristics including, concession mechanism C. M. Jonker, V. Robu, and J. Treur. An agent architecture for multi-attribute negotiation using incomplete preference information. JAAMAS, 15(2):221252, 2007
  • Slide 104
  • Building blocks: Personality model, Utility function, Rules for guiding choice. Key idea: Models Personality traits of its negotiation partners over time. Uses decision theory to decide how to negotiate, with utility function that depends on models and other environmental features. Pre-defined rules facilitate computation. Plays as well as people; adapts to culture
  • Slide 105
  • Multi-issue, multi-attribute, with incomplete information Domain independent Implemented several tactics and heuristics qualitative in nature Non-deterministic behavior, also via means of randomization R. Lin, S. Kraus, J. Wilkenfeld, and J. Barry. Negotiating with bounded rational agents in environments with incomplete information using an automated agent. Artificial Intelligence, 172(6-7):823 851, 2008 Played at least as well as people Is it possible to improve the QOAgent? Yes, if you have data 105
  • Slide 106
  • Y. Oshrat, R. Lin, and S. Kraus. Facing the challenge of human-agent negotiations via effective general opponent modeling. In AAMAS, 2009 Multi-issue, multi-attribute, with incomplete information Domain independent Implemented several tactics and heuristics qualitative in nature Non-deterministic behavior, also via means of randomization Using data from previous interactions 106
  • Slide 107
  • Example scenario Employer and job candidate Objective: reach an agreement over hiring terms after successful interview 107
  • Slide 108
  • Challenge: sparse data of past negotiation sessions of people negotiation Technique: Kernel Density Estimation 108
  • Slide 109
  • Estimate likelihood of other party: accept an offer make an offer its expected average utility The estimation is done separately for each possible agent type: The type of a negotiator is determined using a simple Bayes' classifier Use estimation for decision making General opponent modeling 109
  • Slide 110
  • KBAgent as the job candidate Best result: 20,000, Project manager, With leased car; 20% pension funds, fast promotion, 8 hours 20,000 Team Manager With leased car Pension: 20% Slow promotion 9 hours 12,000 Programmer Without leased car Pension: 10% Fast promotion 10 hours 20,000 Project manager Without leased car Pension: 20% Slow promotion 9 hours KBAgent Human 110
  • Slide 111
  • KBAgent as the job candidate Best agreement: 20,000, Project manager, With leased car; 20% pension funds, fast promotion, 8 hours KBAgent Human 20,000 Programmer With leased car Pension: 10% Slow promotion 9 hours Round 7 12,000 Programmer Without leased car Pension: 10% Fast promotion 10 hours 20,000 Team Manager With leased car Pension: 20% Slow promotion 9 hours 111
  • Slide 112
  • 112 Experiments 172 grad and undergrad students in Computer Science People were told they may be playing a computer agent or a person. Scenarios: Employer-Employee Tobacco Convention: England vs. Zimbabwe Learned from 20 games of human-human 112
  • Slide 113
  • 113 Results: Comparing KBAgent to others Player TypeAverage Utility Value (std) KBAgent vs people Employer 468.9 (37.0) QOAgent vs peoples417.4 (135.9) People vs. People408.9 (106.7) People vs. QOAgent431.8 (80.8) People vs. KBAgent380. 4 (48.5) KBAgent482.7 (57.5) QOAgent Job Candidate 397.8 (86.0) People vs. People310.3 (143.6) People vs. QOAgent320.5 (112.7) People vs. KBAgent370.5 (58.9) 113
  • Slide 114
  • 114 Main results In comparison to the QOAgent The KBAgent achieved higher utility values than QOAgent More agreements were accepted by people The sum of utility values (social welfare) were higher when the KBAgent was involved The KBAgent achieved significantly higher utility values than people Results demonstrate the proficiency negotiation done by the KBAgent General opponent modeling improves agent negotiations General opponent* modeling improves agent bargaining
  • Slide 115
  • 115 I will be too tired in the afternoon!!! I arrange for you to go to the physiotherapist in the afternoon How can I convince him? What argument should I give?
  • Slide 116
  • 116 How should I convince him to provide me with information?
  • Slide 117
  • Which information to reveal? 117 Should I tell him that I will lose a project if I dont hire today? Should I tell him I was fired from my last job? Should I tell her that my leg hurts? Should I tell him that we are running out of antibiotics? Build a game that combines information revelation and bargaining 117
  • Slide 118
  • 118 I will be too tired in the afternoon!!! I arrange for you to go to the physiotherapist in the afternoon How can I convince him? What argument should I give?
  • Slide 119
  • 119 How should I convince him to provide me with information?
  • Slide 120
  • An infrastructure for agent design, implementation and evaluation for open environments Designed with Barbara Grosz (AAMAS 2004) Implemented by Harvard team and BIU team 120
  • Slide 121
  • Interesting for people to play analogous to task settings; vivid representation of strategy space (not just a list of outcomes). Possible for computers to play Can vary in complexity repeated vs. one-shot setting; availability of information; communication protocol. 121
  • Slide 122
  • Learns the extent to which people are affected by social preferences such as social welfare and competitiveness. Designed for one-shot take-it-or-leave-it scenarios. Does not reason about the future ramifications of its actions. Y. Gal and A. Pfeffer: Predicting people's bidding behavior in negotiation. AAMAS 2006: 370-376
  • Slide 123
  • Agents for Revelation Games Peled Noam, Gal Kobi, Kraus Sarit 123
  • Slide 124
  • 124- Introduction - Revelation games Combine two types of interaction Signaling games (Spence 1974) Players choose whether to convey private information to each other Bargaining games (Osborne and Rubinstein 1999) Players engage in multiple negotiation rounds Example: Job interview
  • Slide 125
  • 125- Colored Trails (CT) Asymmetric Symmetric
  • Slide 126
  • 126 Results from the social sciences suggest people do not follow equilibrium strategies: Equilibrium based agents played against people failed. People rarely design agents to follow equilibrium strategies (Sarne et al AAMAS 2008). Equilibrium strategies are usually not cooperative all lose. 126
  • Slide 127
  • 127- Perfect Equilibrium (PE) Agent Solved using Backward induction. No signaling. Counter-proposal round (selfish): Second proposer: Find the most beneficial proposal while the responder benefit remains positive. Second responder: Accepts any proposal which gives it a positive benefit.
  • Slide 128
  • 128- PE agent Phase one First proposal round (generous): First proposer: propose the opponents counter-proposal. First responder: Accepts any proposals which gives it the same or higher benefit from its counter-proposal. Revelation phase - revelation vs non revelation: In both boards, the PE with goal revelation yields lower or equal expected utility than non-revelation PE
  • Slide 129
  • 129- Benefits Diversity Average proposed benefit to players from first and second rounds
  • Slide 130
  • 130- Performance of PEQ agent
  • Slide 131
  • 131- Revelation Effect Only 35% of the games played by humans included revelation Revelation had a significant effect on human performance but not on agent performance Revelation didn't help the agent People were deterred by the strategic machine-generated proposals
  • Slide 132
  • 132 Agent based on general opponent modeling: Genetic algorithm Logistic Regression
  • Slide 133
  • 133- SIGAL Agent Learns from previous games. Predict the acceptance probability for each proposal using Logistic Regression. Models human as using a weighted utility function of: Humans benefit Benefits difference Revelation decision Benefits in previous round
  • Slide 134
  • 134- Logistic Regression using a Genetic Algorithm
  • Slide 135
  • 135- Expected benefit maximization
  • Slide 136
  • 136- Maximization round 2
  • Slide 137
  • 137- Strategy Comparison Strategies for the asymmetric board, non of the players has revealed, the human lacks 2 chips for reaching the goal, the agent lacks 1: * In first round the agent was proposed a benefit of 90
  • Slide 138
  • 138- Heuristics Tit for Tat Never give more than you asks in the counter-proposal Risk averseness Isoelastic utility:
  • Slide 139
  • 139- Learned Coefficients Responder benefit: (0.96) Benefits difference: (-0.79) Responder revelation: (0.26) Proposer revelation: (0.03) Responder benefit in first round: (0.45) Proposer benefit in first round: (0.33)
  • Slide 140
  • 140- Methodology Cross validation. 10-fold Over-fitting removal. Stop learning in the minimum of the generalization error Error calculation on held out test set. Using new human-human games Performance prediction criteria.
  • Slide 141
  • 141- Performance General opponent* modeling improves agent negotiations
  • Slide 142
  • 142
  • Slide 143
  • 143 Agent based on general* opponent modeling Decision Tree/ Nave Byes AAT 143
  • Slide 144
  • Aspiration Adaptation Theory (AAT) Economic theory of peoples behavior (Selten) No utility function exists for decisions (!) Relative decisions used instead Retreat and urgency used for goal variables 144 Avi Rosenfeld and Sarit Kraus. Modeling Agents through Bounded Rationality Theories. Proc. of IJCAI 2009., JAAMAS, 2010.
  • Slide 145
  • 145 1000 145
  • Slide 146
  • 146 1000 900
  • Slide 147
  • 147 1000 900 950 If price < 800 buy; otherwise visit 5 stores and buy in the cheapest. 147
  • Slide 148
  • 148
  • Slide 149
  • General opponent* modeling in cooperative environments 149
  • Slide 150
  • Communication is not always possible: High communication costs Need to act undetected Damaged communication devices Language incompatibilities Goal: Limited interruption of human activities Zuckerman, S. Kraus and J. S. Rosenschein. Using Focal Points Learning to Improve Human-Machine Tactic Coordination, JAAMAS, 2010. 150
  • Slide 151
  • Divide 100 into two piles, if your piles are identical to your coordination partner, you get the 100. Otherwise, you get nothing. 101 equilibria 151
  • Slide 152
  • 9 equilibria 16 equilibria 152
  • Slide 153
  • Thomas Schelling (63) Focal Points = Prominent solutions to tactic coordination games 153
  • Slide 154
  • Domain-independent rules that could be used by automated agents to identify focal points: Properties: Centrality, Firstness, Extremeness, Singularity. Logic based model Decision theory based model Algorithms for agents coordination Kraus and Rosenchein MAAMA 1992 Fenster et al ICMAS 1995 Annals of Mathematics and Artificial Intelligence 2000 154
  • Slide 155
  • 155 Agent based on general* opponent modeling Decision Tree/ neural network Focal Point 155
  • Slide 156
  • 156 Agent based on general opponent modeling: Decision Tree/ neural network raw data vector FP vector 156
  • Slide 157
  • 3 experimental domains: 157 157
  • Slide 158
  • very similar domain (VSD) vs similar domain (SD) of the pick the pile game. 158 General opponent* modeling improves agent coordination 158
  • Slide 159
  • 159 Experiments with people is a costly process
  • Slide 160
  • Peer Designed Agents (PDA): computer agents developed by humans Experiment: 300 human subjects, 50 PDAs, 3 EDA Results: EDA outperformed PDAs in the same situations in which they outperformed people, on average, EDA exhibited the same measure of generosity 160 R. Lin, S. Kraus, Y. Oshrat and Y. Gal. Facilitating the Evaluation of Automated Negotiators using Peer Designed Agents, in AAAI 2010.
  • Slide 161
  • Negotiation and argumentation with people is required for many applications General* opponent modeling is beneficial Machine learning Behavioral model Challenge: how to integrate machine learning and behavioral model 161
  • Slide 162
  • 1. S.S. Fatima, M. Wooldridge, and N.R. Jennings, Multi-issue negotiation with deadlines, Jnl of AI Research, 21: 381-471, 2006. 2. R. Keeney and H. Raiffa, Decisions with multiple objectives: Preferences and value trade-offs, John Wiley, 1976. 3. S. Kraus, Strategic negotiation in multiagent environments, The MIT press, 2001. 4. S. Kraus and D. Lehmann. Designing and Building a Negotiating Automated Agent, Computational Intelligence, 11(1):132-171, 1995 5. S. Kraus, K. Sycara and A. Evenchik. Reaching agreements through argumentation: a logical model and implementation. Artificial Intelligence journal, 104(1-2):1-69, 1998. 6. R. Lin and Sarit Kraus. Can Automated Agents Proficiently Negotiate With Humans? Communications of the ACM Vol. 53 No. 1, Pages 78-88, January, 2010. 7. R. Lin, S. Kraus, Y. Oshrat and Y. Gal. Facilitating the Evaluation of Automated Negotiators using Peer Designed Agents, in AAAI 2010. 162
  • Slide 163
  • 8. R. Lin, S. Kraus, J. Wilkenfeld, and J. Barry. Negotiating with bounded rational agents in environments with incomplete information using an automated agent. Artificial Intelligence, 172(6-7):823 851, 2008 9. A. Lomuscio, M. Wooldridge, and N.R. Jennings, A classification scheme for negotiation in electronic commerce, Int. Jnl. of Group Deciion and Negotiation, 12(1), 31-56, 2003. 10. M.J. Osborne and A. Rubinstein, A course in game theory, The MIT press, 1994. 11. M.J. Osborne and A. Rubinstein, Bargaining and Markets, Academic Press, 1990. 12. Y. Oshrat, R. Lin, and S. Kraus. Facing the challenge of human-agent negotiations via effective general opponent modeling. In AAMAS, 2009 13. H. Raiffa, The Art and Science of Negotiation, Harvard University Press, 1982. 14. J.S. Rosenschein and G. Zlotkin, Rules of encounter, The MIT press, 1994. 15. I. Stahl, Bargaining Theory, Economics Research Institute, Stockholm School of Economics, 1972. 16. I. Zuckerman, S. Kraus and J. S. Rosenschein. Using Focal Points Learning to Improve Human-Machine Tactic Coordination, JAAMAS, 2010. 163
  • Slide 164
  • 2 nd annual competition of state-of-the-art negotiating agents to be held in AAMAS11 Do you want to participate? At least $2,000 for the winner! Contact us! [email protected] Tournament