An intelligent Othello player combining machine learning ...
The Laboratory for Intelligent Processes and Systems Electrical and Computer Engineering The...
-
Upload
alicia-barnett -
Category
Documents
-
view
216 -
download
3
Transcript of The Laboratory for Intelligent Processes and Systems Electrical and Computer Engineering The...
The Laboratory for Intelligent Processes and SystemsElectrical and Computer Engineering
The University of Texas at Austinhttp://www.lips.utexas.edu
Combining Job and Team Selection Heuristics
Chris L. D. Jones and K. Suzanne Barber
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN 2
Selfish Agents Making Strategic Decisions
Selfish Agents
Goal is to maximize profit by maximizing its own estimated payoff function
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN 3
Defining the Environment
Selfish Agents
Dynamic Environments
Non-zero probability that subtasks will be added or subtracted from a job
Unenforceable Contracts
Agents face no penalty from de-committing from a team
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Scenario: Freelancers on the Internet
4
However, lack of a single needed skill can break entire enterprise
Selfish agents have the opportunity to create a lucrative website
Creating the website involves multiple elements: hardware, software, content and advertising
In a dynamic environment, elements of the solution change during development
Because contracts are unenforceable, some freelancers quit without warning
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Selfish Agents and Unenforceable Contracts
5
• Selfish agents exchange goods or services without an enforcement mechanism if exchange parameters are static [Sandholm and Lesser, 1995]
• Selfish agents utilize concepts such as the Core, Kernel, and Shapley value to navigate static coalitional games[Myerson, 1991; Davis et al, 1963; Shapley, 1997]
• Bounded search may be used to search through static coalition space of selfish agents [Sandholm et al, 1999; Rahwan et al, 2007]
All approaches rely on valuations and planning associated with static problems
Selfish Agents
Unenforceable Contracts
Dynamic Environments
• Agents may form institutions or coalitions based on static common goals [Gaertner et al, 2008]
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Dynamic Environments and Unenforceable Contracts
6
• Cooperative agents may be reassigned roles and tasks within a team as circumstances change [Tambe et al, 2000; Nair et al 2003]
• Cooperative agents work in highly dynamic Robocup Rescue domain [Nazemi et al, 2005; Nair et al, 2001; Lau et al 2005]
• Agents cooperate to solve distributed constraint satisfaction problems [Scerri et al, 2005; Modi et al, 2001]
All approaches rely on cooperative agents maximizing team’s utility over their own
Selfish Agents
Unenforceable Contracts
Dynamic Environments
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Selfish Agents in a Dynamic Environment
7
• Selfish agents in dynamic environments utilize contingency contracts to guard against specific events [Raiffa, 1982; Faratin and Klein, 2001]
•Leveled commitment contracts allow agents to leave a team by paying a penalty [Sandholm and Lesser, 1996; Sandholm et al, 1999; Andersson et al, 2001]
• Central fault-tolerance frameworks can enforce penalties on agents which fail to provide contracted services [Smith, 1980; Dellarocas and Klein, 2000; Patel et al, 2005]
All approaches rely on some form of contract enforcement between agents
Selfish Agents
Unenforceable Contracts
Dynamic Environments
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
A Gap in the Prior Work
8
Selfish Agents
Dynamic Environments
Unenforceable Contracts
Selfish agents in a dynamic environment with unenforceable contracts
• Trust and/or reputation information may be used to find agents less likely to defect [Fullam, 2007]
• Agents may follow societal norms which minimize defection [Oren et al, 2008]
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN 9
Strategies to Maximize Payoff
Agent combines a job selection heuristic with a team selection heuristic to form a profit-maximizing strategy
Selfish agents in a dynamic environment with unenforceable contracts
Agent needs to select a profitable job to work on
Agent needs to select a capable team to work with
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Building Strategies by Combining Heuristics
Combining a job selection heuristic with a team selection heuristic produces a strategy• Two job selection heuristics and five team
selection heuristics gives us ten possible strategies
Previous simulation work created multiple classes of agents each of which executed a different strategy
10
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN 11
Greedy job heuristic• Selects job most profitable to foreman while taking completed
work into account
Lean job heuristic• Selects job which can be completed the quickest
Job Selection Heuristics
maxJj J
(TaskLength ij C(Tij ))Tij AssignedInstanceskj
minJj J
(TaskLength ij C(Tij ))Tij AssignedInstanceskj
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Null team heuristic• Randomly selects team from top-ranked job
Fast team heuristic• Minimize time to completion
Redundant team heuristic• Maximize number of duplicate skills
Auxiliary team heuristic• Maximize number of unused skills
MinPartner team heuristic• Minimize number of partners
12
Team Selection Heuristics
kjijj stancesAssignedInT
ijijJJ
TCTaskLength ))((min
1
1
minx maxAk Team x
(TaskLength iw C(Tiw ))Tiw AssignedInstanceskw
maxx
1 Tiw AgentSkillskAk Team x
, 1
Otherwise, 0
Tiw ActiveTasksw
ActiveTasksw
1
min TeamX
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN 13
Benefits of Heuristic-based Strategies
Strategies allow selfish agents to make immediate estimates of how their actions will effect their utility• Immediate estimates of job worth can be used since no decommitment
penalties are possible• Estimates of team worth based on how well teams may adapt to dynamic
circumstances
Use agent simulation to test relative utility of team formation strategies [Jones and Barber, 2007]• Previous work did not explore different mechanisms for utilizing
heuristic information
How should job and team selection heuristics be combined in a strategy?
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN 14
Approach: Separate Job/Team (SJT) Formation
Foreman Agent
Job selection heuristicselects the most
attractive jobs
Team selection heuristicranks the most attractive teams for all selected jobs
Foreman agent selectsthe top-ranked team and sends request
to form team
Agents respond to request based on job
heuristic
Information about job heuristic value does not affect team selection process
Therefore, SJT may prefer more robust teams at the expense of agent profit
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN 15
Approach: Combined Job/Team (CJT) Formation
Foreman Agent
Job selection heuristicselects the most
attractive jobs
Combined job and team selectionheuristics rank the best job/team assignments for all selected jobs
Foreman agent selectsthe top-ranked job/team
pairing and sends request to form team
Agents respond to request based on
combined job and team heuristics
Normalized heuristics are multiplied together, so that job heuristic information influences team selection process
Robust teams are therefore not selected at the expense of agent profit
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Experimental Parameters
16
Parameter ValueHeuristic usage mechanism SJT, CJTNumber of classes 10Agents per class 250Per round chance of agent acting as foreman 1%Jobs 1000|T| 20|AgentSkills| 5Initial size of |ActiveTasks| 10Range of TaskLength 1 to 10 roundsCredit received per round of completed task instance 1
Number of potential teams examined per top-rank job 15
Dynamicism range 0% to 100%, 25% increment
Number of rounds per simulation 2500Number of simulations per dynamicism step 20
2500 agents in simulation
Simulation tests increasingly dynamic
environments by changing required subtasks
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Credit Earned at 0% dynamicism
17
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
CJT agents equal or exceed SJT agents by statistically significant
margins
Agent Strategies
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Credit Earned at 25% dynamicism
18
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
GMP strategy works best in relatively static
environments
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Credit Earned at 50% dynamicism
19
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Credit Earned at 75% dynamicism
20
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
GA strategy works best in relatively static
environments
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Credit Earned at 100% dynamicism
21
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
The CJT advantage over SJT agents continues over
all sampled dynamicism values
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Jobs Completed at 0% Dynamicism
22
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
CJT likewise has a statistically significant advantage over SJT
in percentage of jobs successfully completed
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Jobs Completed at 25% Dynamicism
23
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Jobs Completed at 50% Dynamicism
24
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Jobs Completed at 75% Dynamicism
25
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Jobs Completed at 100% Dynamicism
26
Gre
edyN
ull
Gre
edyA
uxi
liar
LeanFa
st (
LF)
LeanM
inPa
rtner
GreedyNull (GN)
GreedyFast (GF)
GreedyRedundant (GR)
GreedyAuxiliary (GA)
GreedyMinPartners (GMP)
LeanNull (LN)
LeanFast (LF)
LeanRedundant (LR)
LeanAuxilary (LA)
LeanMinPartners (LMP)
Agent Strategies
The CJT advantage over SJT agents in jobs completed continues for all sampled
dynamicism values
CJT LA job completion improves
markedly
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Conclusions and Future Work
Simultaneous usage of job and team selection heuristics improves credit earned and jobs completed
Mechanism works in dynamic environments where thousands of selfish agents work without enforceable contracts
Future work:• Dynamic weighting of job and team-selection
heuristics• Development of theoretical framework for determining
job and team selections
27
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
References Kraus, S., O. Shehory, et al. (2003). Coalition formation with uncertain heterogeneous information, ACM Press New
York, NY, USA: 1-8.
Tambe, M., D. V. Pynadath, et al. (2000). Building dynamic agent organizations in cyberspace. 4: 65-73.
Sandholm, T. W. and V. R. Lesser (1995). Equilibrium Analysis of the Possibilities of Unenforced Exchange in Multiagent Systems, University of Massachusetts at Amherst, Computer Science Dept.
Myerson, R. B. (1991). Game theory: analysis of conflict, Harvard University Press.
Davis, M. and M. Maschler (1963). THE KERNEL OF A COOPERATIVE GAME, DTIC Research Report AD0418434.
Shapley, L. S. (1997). A VALUE FOR n-PERSON GAMES, Princeton University Press.
Sandholm, T., S. Sikka, et al. (1999). Algorithms for optimizing leveled commitment contracts: 535-540.
Rahwan, T., S. D. Ramchurn, et al. (2007). Near-optimal anytime coalition structure generation: 2365-2371.
Gaertner, D., Rodrigez, J. A., et al. (2008. Agreeing on Institutional Goals for Multi-agent Societies.
Nair, R., M. Tambe, et al. (2003). Role allocation and reallocation in multiagent teams: towards a practical analysis, ACM Press New York, NY, USA: 552-559.
Nazemi, E., M. Faradad, et al. (2005). SBCe_Saviour Team Description. Tehran, Iran, ShahidBeheshti University: 6.
Nair, R., T. Ito, et al. (2001). Task Allocation in the RoboCup Rescue Simulation Domain: A Short Note, Springer.
Lau, N., L. P. Reis, et al. (2005). FC Portugal 2005 Rescue Team Description: Adapting Simulated Soccer Coordination Methodologies to the Search and Rescue Domain.
Scerri, P., A. Farinelli, et al. (2005). Allocating tasks in extreme teams, ACM Press New York, NY, USA: 727-734.
Modi, P. J., H. Jung, et al. (2001). A dynamic distributed constraint satisfaction approach to resource allocation, Springer.
Raiffa, H. (1982). The art and science of negotiation, Belknap Press of Harvard University Press Cambridge, Mass.
29
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
References Faratin, P. and M. Klein (2001). Automated Contract Negotiation and Execution as a System of Constraints, MIT,
Cambridge.
Sandholm, T. W. and V. R. Lesser (1996). Advantages of a leveled commitment contracting protocol: 126-133.
Sandholm, T., S. Sikka, et al. (1999). Algorithms for optimizing leveled commitment contracts: 535-540.
Andersson, M. R. and T. W. Sandholm (2001). Leveled commitment contracts with myopic and strategic agents, Elsevier. 25: 615-640.
Smith, R. G. (1980). The contract net protocol. 29: 1104-1113.
Dellarocas, C. and M. Klein (2000). An experimental evaluation of domain-independent fault handling services in open multi-agent systems: 95-102.
Patel, J., W. T. L. Teacy, et al. (2005). Agent-based virtual organisations for the Grid, IOS Press. 1: 237-249.
Fullam, K. (2007). Learning Trust Decision Strategies in Emerging Reputation Networks.
Oren, N., Luck, M., et al. (2008). An Argumentation Inspired Heuristic for Resolving Normative Conflict.
Jones, C. L. D. and K. S. Barber (2007). Bottom-up Team Formation Strategies in a Dynamic Environment: 60-74.
Jones, C. L. D. and K. Barber (2008). Combining Job and Team Selection Heuristics. 2008 AAMAS Workshop on Coordination, Organization, Institutions and Norms. Lisbon, Portugal, ACM.
Sutton, R. S. and A. G. Barto (1998). Reinforcement Learning: An Introduction, MIT Press.
Klusch, M. and A. Gerber (2002). Dynamic coalition formation among rational agents. 17: 42-47.
Lin, C., S. Hu, et al. (2007). An Anytime Coalition Restructuring Algorithm in an Open Environment, Springer. 4681: 80.
30
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Causes of Agent and Job dynamicism
Changes to Job requirements• Bounded rationality• Incomplete information• Inherent environmental dynamics
Changes to Team membership• Agent failure• Agent defection
Changes to current job requirements make job less attractive
Changes to alternate job requirements makes job more attractive
Teammate defection decreases robustness of current team, likelihood of expected payoff
32
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Domain Assumptions
Agents are multiskilled• Single-skilled agents would be unable to provided the
reserve of redundant skills needed
Contractless environments feature non-transferable utility• With transferable utility, contracts become possible
Quality and Timeliness are not represented• Both could probably be represented as different types of
subtasks, e.g. subtask requiring 90% QoS is different than subtask requiring 50% QoS
• Quality and Timeliness are both likely to be domain dependent – not a huge difference between 80% and 100% QoS in ditch digging, but huge difference in brain surgery
33
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Comparison to Trust work
This work is complementary to trust• Can be used when trust information is unavailable or
unreliable• Can be used when environmental dynamics make
agents change their trustworthiness over time
Work is supplementary to trust• Can be used in addition to trust, to compensate for
non-trustworthy agents• Can be used when trust system is bootstrapping, and
agents need to explore space of likely untrustworthy agents
34
© 2008 THE UNIVERSITY OF TEXAS AT AUSTIN
Foreman flowchart
35
Opportunity to become foreman?
Idle agent
Determine optimal job and team selections
Team formation
successful?
Work on job
Foreman’s work
complete?
YesNo
Yes
No
No
No
No
Receive job credit
Job complete?
Yes
Yes
Agent failure or new
requirement?
Send offer messages to worker agents
Can allocate (or recruit)
needed skills?
Yes
No