Silvia task

112
Robotics and Autonomous Systems 30 (2000) 65–84 The call of duty: Self-organised task allocation in a population of up to twelve mobile robots Michael J.B. Krieger a,1 , Jean-Bernard Billeter b,*,2 a Institute of Ecology, University of Lausanne, 1015 Lausanne, Switzerland b Laboratoire de Micro-informatique, Swiss Federal Institute of Technology, 1015 Lausanne, Switzerland Abstract Teams with up to 12 real robots were given the mission to maintain the energy stocked in their nest by collecting food-items. To achieve this mission efficiently, we implemented a simple and decentralised task allocation mechanism based on individual activation-thresholds, i.e. the energy level of the nest under which a given robot decides to go collect food-items. The experiments show that such a mechanism — already studied among social insects — results in an efficient dynamical task allocation even under the noisy conditions prevailing in real experiments. Experiments with different team sizes were carried out to investigate the effect of team size on performance and risk of mission failure. ©2000 Elsevier Science B.V. All rights reserved. Keywords: Collective robotics; Task allocation; Division of labour; Self-organisation 1. Introduction In human and other social groups with advanced labour division, life is organised around a series of concurrent activities. For a society to function effi- ciently, the number of individuals (team size) involved in these activities has to be continuously adjusted such as to satisfy its changing needs. The process regulat- ing team size — and thus modulating labour division — is called task allocation. It can be evident when centralised and embodied in a special agency (like a foreman dispatching men on a working site) or it can * Corresponding author. Tel.: +41-22-7400094; fax: +41-22- 7400094. E-mail address: [email protected] (J.-B. Billeter). 1 Present address: Department of Entomology and Department of Microbiology, University of Georgia, Athens, GA 30602, USA. 2 Laboratoire de Micro-informatique, EPFL, http://diwww.epfl.ch/ lami be less visible when decentralised (as with neighbours providing unsupervised help after an earthquake). Be- hind the diversity of possible task allocation mecha- nisms lays a common structure: they all act at the indi- vidual level, prompting individuals either to continue or to change their activities (Fig. 1). The condition that triggers the change to another activity may be a simple rule of thumb or a complex decision-making procedure. Task or role 3 allocation has been extensively stud- ied in social insects (e.g. [6,10,13,15,26–28,32,33]). The study of task allocation in social insects is particularly interesting since their labour division and 3 The words task, activity and role may often be used one for the other, as in “My task, activity, role, is to sweep the yard”. Still, task specifies “what has to be done”, activity “what is being done”, and role “the task assigned to a specific individual within a set of responsibilities given to a group of individuals”. Caste defines a group of individuals specialised in the same role. 0921-8890/00/$ – see front matter ©2000 Elsevier Science B.V. All rights reserved. PII:S0921-8890(99)00065-2

description

silvia task manager

Transcript of Silvia task

Page 1: Silvia task

Robotics and Autonomous Systems 30 (2000) 65–84

The call of duty: Self-organised task allocation in a population of up totwelve mobile robots

Michael J.B. Krieger a,1, Jean-Bernard Billeter b,!,2a Institute of Ecology, University of Lausanne, 1015 Lausanne, Switzerland

b Laboratoire de Micro-informatique, Swiss Federal Institute of Technology, 1015 Lausanne, Switzerland

Abstract

Teams with up to 12 real robots were given the mission to maintain the energy stocked in their nest by collecting food-items.To achieve this mission efficiently, we implemented a simple and decentralised task allocation mechanism based on individualactivation-thresholds, i.e. the energy level of the nest under which a given robot decides to go collect food-items. Theexperiments show that such a mechanism — already studied among social insects — results in an efficient dynamical taskallocation even under the noisy conditions prevailing in real experiments. Experiments with different team sizes were carriedout to investigate the effect of team size on performance and risk of mission failure. ©2000 Elsevier Science B.V. All rightsreserved.

Keywords: Collective robotics; Task allocation; Division of labour; Self-organisation

1. Introduction

In human and other social groups with advancedlabour division, life is organised around a series ofconcurrent activities. For a society to function effi-ciently, the number of individuals (team size) involvedin these activities has to be continuously adjusted suchas to satisfy its changing needs. The process regulat-ing team size — and thus modulating labour division— is called task allocation. It can be evident whencentralised and embodied in a special agency (like aforeman dispatching men on a working site) or it can

! Corresponding author. Tel.: +41-22-7400094; fax: +41-22-7400094.E-mail address: [email protected] (J.-B. Billeter).1 Present address: Department of Entomology and Department ofMicrobiology, University of Georgia, Athens, GA 30602, USA.2 Laboratoire de Micro-informatique, EPFL, http://diwww.epfl.ch/lami

be less visible when decentralised (as with neighboursproviding unsupervised help after an earthquake). Be-hind the diversity of possible task allocation mecha-nisms lays a common structure: they all act at the indi-vidual level, prompting individuals either to continueor to change their activities (Fig. 1). The conditionthat triggers the change to another activity may be asimple rule of thumb or a complex decision-makingprocedure.Task or role 3 allocation has been extensively stud-

ied in social insects (e.g. [6,10,13,15,26–28,32,33]).The study of task allocation in social insects isparticularly interesting since their labour division and

3 The words task, activity and role may often be used one forthe other, as in “My task, activity, role, is to sweep the yard”.Still, task specifies “what has to be done”, activity “what is beingdone”, and role “the task assigned to a specific individual withina set of responsibilities given to a group of individuals”. Castedefines a group of individuals specialised in the same role.

0921-8890/00/$ – see front matter ©2000 Elsevier Science B.V. All rights reserved.PII: S0921 -8890 (99 )00065 -2

Page 2: Silvia task

66 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

Fig. 1. Individual choice between two activities.

its regulation are organised by surprisingly simple androbust means. Task allocation within insect colonieswas considered a rigid process. The different activitieswere associated with different castes and caste poly-morphism was related to genetic or internal factors[11]. At the same time, other observations indicatedthat individuals could change activity during their lifespan [27,32], suggesting other than genetic factorsbeing relevant for task allocation. These findings havelet to reformulate the caste definition based purelyon morphological or genetic criteria to incorporateage or simple behavioral differences [13,27]. Thus,recent research on task allocation in social insectsconcentrates on behavioral flexibility and stresses theimportance of external and decentralised factors likepheromones or individual encounters [5,25]. One ofthe most inspiring models to explain this decentralisedand flexible task allocation found in social insects isthe activation-threshold model.In the activation-threshold model, individuals re-

act to stimuli that are intrinsically bound to the taskto be accomplished. For instance, neglected brood orthe corpse of dead ants diffuse an odour of increasingstrength. When this stimulus reaches an individual’sthreshold value, the individual reacts by adopting therelevant activity (in our example: grooming the broodor carrying the corpse out of the nest) or by increas-ing its likelihood to do so. This is a proximal mech-anism: individuals closer to the work to be done aremost likely to be recruited. Moreover, if the individualsdo not have the same threshold value, the recruitmentis gradual, which may allow regulation of the teams’sizes [6,26,27]. Indeed, Bonabeau et al. [4] have shownthat such a simple activation-threshold model “. . .

can account for the workers’ behavioral flexibility”. 4Similarly, models in sociology have shown that sim-ple reaction-threshold differences among individualsmay lead to complex social dynamics [9,30]. The pur-pose of the experiments described below was to testthe efficiency of the activation-threshold mechanismfor task allocation in practical robotics.The regulation of a group of robots engaged in sev-

eral concurrent activities involves regulating the teammembers’ activity in real time (dynamical task allo-cation). A variety of mechanisms may achieve taskallocation, however, when working with real mobilerobots whose perceptions, communication and actionsare reckoned to be limited, it is advisable to selecta mechanism for its simplicity and its robustness. Agood candidate for a robust and simple task alloca-tion mechanism is the activation-threshold mechanismdescribed above [6,26,27]. Its task-related recruitingstimuli increase when the tasks to be accomplishedare neglected, acting as a feed-back. In a team of Nagents whose choice of activities is limited to two,neglecting the first activity (because too many indi-viduals are engaged in the second activity) causes thestimulus for the first activity to increase, promptingindividuals to change from the second to the first ac-tivity; and conversely (Fig. 2). Choosing appropriateactivation-thresholds is crucial for the performanceof the robot team since individuals with the sameactivation-thresholds and exposed to the same stim-uli switch activity at the same time, yielding gen-erally a poor regulation. Hence, one of our objec-tives was to show that simply implementing differentactivation-thresholds is sufficient for an effective taskallocation mechanism in robotic experiments.

4 The idea of activation-threshold might seem a truism since anychange can be brought back to the fact that a variable crossed athreshold! What the model wants to stress is the origin and thesimplicity of the activation variable directly tied to the recruitingactivity. In the case of corpse carrying witnessed among someant species, the stimulus is a natural by-product of the corpsedecomposition, namely oleic acid. We are all familiar with thistype of mechanism: when there is a smell of gas, we go checkthe gas-range; when there is a smell of cheese, we put the cheeseaway into the refrigerator. The point, however, is that social insectsseem to regulate all their colony activities in this way, whereas,among humans, the dynamical allocation of workforce betweenthe different trades is a rather complex procedure including con-siderations of aptitudes, tradition, interest, openings, wages, etc.

Page 3: Silvia task

M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84 67

Fig. 2. Individual choice between two activities with a fixedactivation-threshold. Neglecting activity 1 causes the stimulus foractivity 1 to increase, prompting individuals to change from ac-tivity 2 to activity 1; and conversely.

From the biologist’s point of view, the purpose ofthese experiments is to contribute a piece of heuristicevidence that complex social systems may be organ-ised on decentralised organisation principles. Centralto sociality is labour division, with its correlates ofspecialisation, cooperation and task allocation. As wehave seen above, social insects are thought to organ-ise an important aspect of labour division, i.e. taskallocation, in a decentralised manner where each in-dividual’s decision is made according to a simple setof rules based on local information only. Many com-plex patterns and collective behaviours observed atthe colony (macroscopic) level emerge as the aggre-gated result of the decentralised interactions at theindividual (microscopic) level. This mode of organ-isation, termed self-organisation [19], could accountfor many collective phenomena found in social in-sects [8]. Other authors used successfully mathemat-ical models of self-organisation to model either spe-cific behaviours (e.g. [35,37]) or whole insect colonies[3,20,21]. Yet it is difficult to prove unequivocally thatself-organisation is the main mechanism operating in

social insect societies and that all complex collectivebehaviours emerge from interactions among individu-als with simple stereotyped behaviours.Our experiments intend to bring heuristic evidence,

and thus shed some light on two questions: first, canwe imagine plausible mechanisms of automatic anddecentralised control for insect societies; and secondly,do these mechanisms account for or lend themselvesto the gradual evolution from a solitary individual sys-tems to sociality? Such a gradual evolution implies thatthe transition from the ancestral, solitary state to a so-cial system is beneficial. Hence, individuals in simpleaggregations have to have a better pay-off than solitaryindividuals, and individuals in groups with coopera-tive interactions have to outperform individuals in ag-gregations. Since not all animals live in social systemsit follows that these conditions are not always given.One element which has proven to influence social or-ganisation are environmental factors. Among them,the distribution of food and its availability was identi-fied as one of the key features [1,7,17,29]. Therefore,we also examined the influence of different food dis-tributions on social organisation. However, it shouldbe stressed that our robots do not mimic any specificsocial insect species and therefore, no binding conclu-sion can be obtained by the comparative study of ourrobots’ behaviour and the behaviour of social insects.

2. The robots’ mission

2.1. The mission

The robots’ mission was to search and collect“food-items” in a foraging area (Figs. 3 and 4) andbring them back to the “nest” (Figs. 3 and 4) inorder to keep the nest-energy at a safe level. Theirenergy consumption was activity-related: it was lowwhen they were inactive in the nest, increased whenthey moved around and reached a maximum whenthey were carrying a food-item. For the robots toachieve their mission efficiently, i.e. using globallyas little energy as possible, we dispersed their indi-vidual activation-thresholds to avoid that the robotsleave the nest simultaneously (for comments on theactivation-threshold distribution, see Section 5.1). Theactivation-thresholds were equidistributed between3/4 and full initial nest-energy. This simple task allo-

Page 4: Silvia task

68 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

Fig. 3. The experimental set-up. Experiments were carried out on a 9.24m2 surface with a nest and a foraging area. The entrance ofthe nest was signalled by a light beacon. Next to the entrance, the robots unloaded the collected food-items into a basket. Depending onthe experiment, the food-items were either grouped in two patches of 12 food-items each (clumped food-items) or placed singly at sixlocations (isolated food-items).

cation mechanism resulted in a good modulation ofthe number of individuals engaged in the two possi-ble activities: staying in nest (inactive), and foraging(active). All robots were able to execute either of thetwo tasks and, depending on the situation, were ableto switch from one to the other. They had no priorknowledge about the number and the distribution pat-tern of the food-items, or about the robot team’s size.

2.2. Basic mission cycle

The basic mission (Fig. 5) can be described withthe following pseudo-procedural instructions:1. Wait in nest

• If a robot ahead leaves the waiting line, compactthe waiting line by moving forward.

• Keep listening to control-station’s radiomessagesthat periodically update the nest-energy level.

• Radio to the control-station the energy you havebeen consuming while waiting in the line. In thesame time refill your personal energy store.

• Leave the nest when the radioed nest-energylevel is lower than your personal activation-threshold.

2. Leave nest• Leave the waiting line and move to the exit-lane.• If there are other robots, slow down and give way.• Follow the exit-lane; at its end turn left and leavethe nest.

(Once the robot has left the nest, it is not updatedanymore about the current nest-energy level, norcan it inform the control-station about its energyconsumption. During this time the robot dependsentirely on the energy of its personal energy store.At the same time the robot keeps track of its energyconsumption.)

3. Look for food-items• Start a random search for food-items. Or, if youalready know where to find one, try to returnto the spot using odometry (a helpful but not avery robust localisation method, since any freespinning of the wheels will cause the odometry

Page 5: Silvia task

M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84 69

Fig. 4. An experiment under way (for schematic representation see Figs. 3 and 5).

Fig. 5. Mission cycle.

Page 6: Silvia task

70 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

to be inaccurate). If you have reached the spotwhere you previously detected a food-item butyou do not detect one, start a random search.

• If you have used up your personal energy storereturn to the nest (ignoring step 4 and 5).

• When you detect a food-item, load it.4. Load food-item

• Turn toward food-item, recede a bit, open thegripper, lower the arm, move forward until grip-per’s optical barrier is broken.

• Close the gripper, raise the arm.5. Evaluate site

• Check the vicinity for the presence of morefood-items.

• If you detect one, turn on your odometry.6. Return to nest

• Head toward the nest using the light beacon at itsentrance.

• When reaching the nest’s entrance, radio yourenergy consumption and recharge the personalenergy store.

(Recharging is a data exchange with no visible be-haviour associated.)

7. Unload food-item• Go to the basket and unload your food-item.• If the nest-energy level is higher than youractivation-threshold, stay in the waiting line, oth-erwise exit the nest. If you stay in the nest, eraseany information about locations of food-items

Back to 1

2.3. Mission cycle with information exchange

In a second series of experiments, we introduceda simple coordination among the robots. Robots thatdetected a food source were able to recruit and guideother robots to this food source (robots can record theposition of food-item patches with their odometry).This was inspired by the tandem recruiting-behaviourobserved in the ant Camponotus sericeus [12]. Tworeasons lead us to chose such a behavior based re-cruitment: first, it represents a type of communicationwithout symbolisation which probably established it-self very early during social evolution and secondly,such a mechanism scales up without problems (i.e. canbe used with larger groups), whereas radio transmis-sions relaid by a central base create communicationbottlenecks.

Recruitment was executed just before the robot wasabout to return to the food patch where it previouslyfound food-items. The recruiting itself was rather sim-ple; the recruiting robot approached the robot at thehead of the waiting line, which was the signal for thewaiting robot to follow (Fig. 6). To incorporate in-formation exchange among the robots the followingadditions to the mission cycle were made:• To the robots waiting in the nest:– If you are heading the waiting line, you are a po-tential follower: be ready to be asked to tandem.

– Once you have been asked to tandem, follow theleading robot until you lose it. When you havelost it, start a local search for the food-items.

• To the recruiting robots:– If you know where to find more food-items, tryrecruiting the robot heading the waiting line, byapproaching the one to be recruited to a closedistance (about 5mm).

– Go slowly toward the food source. When youhave reached it, decouple the tandem by makinga fast move forward, wait a while and go backto collect food-items. (By making a fast moveforward the leading robot gets out of the sensoryfield of the follower, which serves as a signal toindicate the arrival at the food patch.)

3. Experimental setup

3.1. Data acquisition

Radio communication was strictly used for control-ling the experiment and for sending data from therobots to the control station for later analysis. The op-erator could initialise the robots from his computer, aswell as start, suspend, resume or stop an experiment.The messages were not broadcast simultaneously toall robots: every robot was individually addressed anda message was considered received only when echoedproperly to the radio base and displayed on the controlstation screen. Every 10 to 20 seconds a data requestwas initiated from the control station to the robots. Therequest signal was radioed together with nest-energy,the only information used by the robots to allocatetasks. The robots were programmed to consider andupdate this external variable while inside the nest, butto ignore it when outside, reflecting the fact that they

Page 7: Silvia task

M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84 71

Fig. 6. Tandem recruitment. (a) Recruiting robot (r) backs up toward the waiting robot (w) at the head of the waiting line. When w’sproximity sensors detect r, w goes into follower mode. (b) Recruited robot (w) follows recruiting robot (r) to the food patch.

do not know the evolution of nest-energy while work-ing outside the nest. In return, the robots updated thecontrol station on their current activity and energy con-sumption for the statistical analysis.

3.2. Experimental procedure

The experiments were carried out on a 9.24m2 sur-face tiled with 64 printed boards (0.38m" 0.38m)(Figs. 3 and 4) with copper strips alternatively con-nected to the two poles of a direct current power sup-ply. The floor was bordered by a wall. Along thewall, a black line was painted on the floor to facilitatethe distinction between the wall and the food-items(for a detailed description see Section 3.5). The nest,whose entrance was signalled by a lamp, was also en-closed with black footed walls. Inside the nest and inits close proximity, black lines were used as tracksfor easy navigation (Fig. 4). Next to the entrance, therobots unloaded the collected food-items into a bas-ket (Figs. 3 and 4). The food-items were small plasticcylinders (3 cm diameter" 3 cm height) with narrowstrips of infrared reflecting tape. Depending on theexperiment, they were either grouped in two patchesof 12 food-items each (clumped food distribution), orplaced singly at six locations (dispersed food distri-bution). All food-items were replaced soon after theywere seized by the robots.We conducted three types of experiments (Table

1): food search in an environment with a dispersedfood distribution, food search in an environment witha clumped food distribution both without recruitmentand finally food search in an environment with aclumped food distribution with recruitment. Each type

Table 1Each type of experiment was repeated eight times (the experimentwith group size one was dropped in Series C for an obviousreason: a single robot cannot recruit another robot)

Recruitment Food distribution Group size

Series A No Dispersed 1 3 6 9 12Series B No Clumped 1 3 6 9 12Series C Yes Clumped 3 6 9 12

of experiment was carried out with groups of one,three, six, nine and twelve robot teams, with the ex-ception of the experiments with recruitment, where theexperiments with group size of one were omitted sincerecruitment involves at least two robots. All experi-ments were repeated eight times (112 experiments intotal) and had a duration of 30 minutes. Before the startof the experiment, all robots were positioned inside thenest. The only deliberate difference among them wastheir activation-threshold whose values were equidis-tributed between 3/4 of and full initial nest-energylevel. In order to allow comparisons between differentteam sizes, the initial nest-energy was proportional tothe number of robots participating in the experiment.A typical experiment with six robots is shown in Fig. 7.

3.3. Statistical data analysis

Two series of analysis were carried out. First, theeffect of the two different food distributions (clumpedversus dispersed) was investigated using the twotypes of experiments without recruitment and, sec-ondly, the effect of recruitment (recruitment versusnon-recruitment) was investigated using the two typesof experiments carried out in the environment with a

Page 8: Silvia task

72 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

Fig. 7. A typical experiment with six robots. (a) Nest-energy; (b) number of active and inactive robots. At the beginning (a), none of therobots is active. As nest-energy decreases (b), the robots progressively leave the nest. At time 800 s (c), all six robots are active. Theirharvest is good, which results in a nest-energy higher than the initial nest-energy (d). As a consequence the robots switch to the inactivemode and stay in nest, as all of them have done by time 1200 s (e). But as nest-energy decreases again (f), the robots resume their foodsearch. At the end of the experiment all but one robot are inactive in the nest (g).

clumped food distribution. Statistical tests were con-ducted with ANOVAs [34]. Performance was calcu-lated as the inverse of the total energy used during thecourse of the experiment. Robustness was measuredas the lowest nest-energy recorded during the exper-iment. Values closer to zero indicate a higher risk ofsystem collapse since the system was considered tohave crashed when nest-energy dropped below zero.To assess the relative amount of work done by eachteam member a skew measure [22] was used. This

measure allows to quantify the skew in individualcontributions to a global task and ranges from zeroto one. A skew of one indicates that one team mem-ber was doing all the work whereas a skew of zeromeans that all team members contributed equally tothe global task. The skew can be calculated with thefollowing formula:

S =!

N # 1"

p2i

#

$

(N # 1) ,

Page 9: Silvia task

M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84 73

where N is the number of individuals in the groupand pi the relative contribution of team member i.

3.4. The robots

We used Khepera 5 robots with three additionalmodules: a gripper turret, a custom-made detectionmodule, and a radio turret (Fig. 8). The Khepera’s ba-sic module is a miniature mobile robot designed as aresearch tool at the Laboratoire de Micro-informatiqueof the Swiss Federal Institute of Technology at Lau-sanne [18], and now produced by K-Team SA. 6 Forour experiments, we redesigned the Khepera’s basicmodule, adding four floor contacts plus a regulator forcontinuous power supply (allowing long experimentswith additional turrets) [16], three floor-oriented IRsensors for floor readings and a castor wheel (Fig. 9).The main specifications of the Khepera’s basic mod-ule are:

Diameter 55mmMotion 2 DC motors with incremental encoder

(about 10 pulses per mm of advance)Power Rechargeable NiCd batteries or externalAutonomy About 45 minutes (maximal activity,

no additional module)Sensors Eight infrared proximity and light

sensorsProcessor Motorola 68331RAM 256KbytesROM 128 or 512Kbytes

3.5. Additional modules

The gripper turret is a standard Khepera modulewith an arm moving from the front to the back, plusthe gripper whose jaw can seize objects up to about55mm. An optical barrier detects the presence of ob-jects between the jaws, and conductive surfaces (notused in our experiments) measure the electrical con-ductivity of the object seized.The detection module was custom-made featuring

two detection units: an ambient light detector and an

5 Robot Khepera: http://lamiwww.epfl.ch/Khepera/#khepera.6 K-team: http://diwww.epfl.ch/lami/robots/K-family/K-Team.html.

optical barrier (light modulation photo IC). The am-bient light detector was used by the robots to navi-gate back to the nest which was signalled by a lightbeacon. The optical light barrier was used to distin-guish the three basic objects present in our experi-ments: walls, food-items and other robots. Distinc-tion between the three basic objects was achieved inthe following way: When the standard IR sensors de-tected the presence of an object, the robot made twoadditional readings: floor reflection and optical bar-rier status. If the floor reflection was low, signallingblack painting, the robot was near a wall. An unbro-ken optical barrier indicated that another robot — theonly object high enough to reflect its beam — wasfacing the robot. If the floor reflection was high (ab-sence of black painting), and the optical barrier wasbroken (absence of other robots), the robot was facinga food-item. The programs used to detect the three ba-sic objects were run continuously and in parallel, oc-casional misinterpretations were usually resolved bythe frequent double-check procedures included in theprogram.The radio module is a standard Khepera turret using

a low power 418MHz transceiver [16]. A communi-cation protocol allows complete control of the robot’sfunctionality through an RS232 serial line.

3.6. Control architecture

Goal of the robots’ control architecture (Fig. 10)was to achieve the desired functionality by the properconnection of the three units: Sensory, Motor and Pro-cessing unit. It can be considered as a black box anddoes not claim any biological relevance to social in-sects.The Sensory unit took measurements in the envi-

ronment and sent associated signals to the Process-ing unit. Apart from the sensors described in Sections3.4 and 3.5, the robot also received sensory feed-backfrom its own motor actions through incremental en-coders for the wheel motors, and potentiometers forthe arm and gripper. The Motor unit executed thecommands sent by the Processing unit. Four differentmotors were addressed: left wheel, right wheel, grip-per’s arm and gripper itself. Motor actions were as-sociated with an energetic cost — purely mathemati-cal — which was subtracted from the robot’s energy

Page 10: Silvia task

74 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

Fig. 8. (a) A Khepera robot with three additional modules: a gripper turret (bottom), a custom-made detection module (middle), and aradio turret (top). (b) A Khepera robot loads a “food-item”.

Page 11: Silvia task

M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84 75

Fig. 9. Modified base. Contacts for continuous power supply (a), floor-oriented IR sensors (b) and castor wheel (c).

level (physiology). The Processing unit consisted oftwo parts, the Decision unit for activating and deacti-vating small programs, and the programs themselves(Fig. 10). The Decision unit receives inputs from threesources:• the physiological variable set;• the conceptual tags (= classified sensor inputs);• the drives.The only physiological variable considered in our

experiments was the robot’s energy level, and the onlydrive used was hunger.At anytime, depending on the inputs, the Decision

unit decided to either activate or deactivate a pro-gram. For instance, when a robot had left the nest,the program “Follow nest tracks” was deactivatedand “Search for food” was activated or when a robotreached the nest “Return to nest” was deactivated and“Follow nest tracks” was activated. The programsthemselves had one or several inputs and generated a

corresponding output (for details, see below). Oncea program was activated, it ran concurrently withall other activated programs until deactivated (thisis facilitated by the multi-tasking capabilities of theKheperas). Some programs had a higher priority andunder certain conditions could override other runningprograms. The programs were categorised into fourdistinct classes depending from which unit they re-ceived their inputs and to which unit they sent theiroutputs.1. Input: Sensory unit, output: Processing unit (“Per-ception”)These programs classified the inputs they receivedfrom the Sensory unit into conceptual tags like “ob-stacle”, “dark floor” or “object in the gripper”. Eachconceptual tag was considered a user pre-definedconcept.

2. Input: Processing unit, output: Processing unit(“Thoughts”)

Page 12: Silvia task

76 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

These programs received one or several conceptualtags as an input and generated another conceptualtag as an output. Example: When both the proposi-tions “there is an obstacle” and “the floor is dark”are confirmed, the Processing unit generates theconcept “wall”.

3. Input: Processing unit, output: Motor unit (“Inten-tional Actions”)These programs used conceptual tags from the Pro-cessing unit to engage specific motor actions. Forexample, after having positively checked the propo-sition “there is a food-item in the gripper”, therobot closed its gripper and lift its arm to transportposition.

4. Input: Sensory unit, output: Motor unit (“Re-flexes”)Reflexes were programs which had to be exe-cuted fast. They received inputs directly from the

Fig. 10. Control architecture of the robots. The control architectureis made out of three main parts: the Sensory unit, the Processingunit (consisting of a Decision unit for activating and deactivatingprograms, and the program themselves), and the Motor unit. TheSensory unit translates the outside world into internal variables,the Processing unit uses these variables to create more complexvariables or to instruct the motor unit to perform an action, andthe Motor unit translates internal commands into a physical action.Depending on the inputs (conceptual tags, drives, physiology), theDecision unit decided to either activate or deactivate programs.The programs were categorised into four distinct classes dependingfrom which unit they received their inputs and to which unit theysent their outputs (for more details see Section 3.6). SIS = sensoryinput signal, CT = conceptual tag, MAC=motor action command.

Sensory unit and generated immediate motor ac-tion. No concepts were generated. Example: Theobstacle avoidance routine.

4. Results

Our experiments showed that an artificial, com-plex system can be regulated using a simpleactivation-threshold as the only control parameter.Nest-energy, the variable to be regulated, was stableand stayed in the experiments with the three, six andnine robot teams well above the activation-thresholdof the robot with the lowest activation-threshold (Fig.11). Only in the experiments with the teams of 12robots, a steady decrease in nest-energy was observed(Fig. 11). Moreover, the activation-threshold allowsshifting from a single individual to a multi-individualsystem without any additional changes. This mech-anism has proven well adapted to task allocation inrobot teams.

4.1. Effect of group size and food distribution onperformance

The effect of the group size and food distributionwas tested using a 2-way ANOVA. The size of therobot team had a significant effect on their perfor-mance (2-way ANOVA, group size: df = 4,1, F= 4.72,P= 0.002). The relative performance, where the bestrobot team has a performance of 100%, was 88.9%,99.5%, 100%, 91.5% and 91.6% for teams of one,three, six, nine and twelve robots, respectively. In bothenvironments the one robot teams had the lowest per-formance whereas the three and the six robot teamshad the highest performance. Performance among thegroups of one, nine and twelve as well as betweenthe groups of three and six was however, not sig-nificantly different (Fisher’s PLSD, 5%). When thefood-items were grouped in patches the robots had asignificantly better performance (2-way ANOVA, fooddistribution: df = 4,1, F= 12.87, P< 0.001). The rela-tive decrease in performance in the environment withdispersed food-item was 7.7%. Yet the difference inperformance among the various group sizes was dif-ferent for the two food distributions. The relative in-crease in performance for robot teams of three and sixwas more pronounced in the environment with dis-persed food-item distributions (Fig. 12).

Page 13: Silvia task

M.J.B.K

rieger,J.-B.Billeter/Roboticsand

Autonomous

Systems30

(2000)65–84

77

Page 14: Silvia task

78 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

Fig. 12. Performance relative to the one robot teams in an environment with clumped and a dispersed food distribution with no informationsharing. Performance was measured as the inverse of the total energy used and the normalised for the number of robots.

4.2. Effect of information sharing and group size onperformance

When the location of the food patches were trans-mitted by recruiting and guiding a team member to thepatch, the performance of the group increased signifi-cantly (2-way ANOVA, information sharing: df = 3,1,F= 39.93, P< 0.001). The increase in performancewas 13%. Again, group size had a significant effecton performance (2-way ANOVA, group size: df = 3,1,F= 6.87, P< 0.001) with the best performance in thethree and the six robot teams.

4.3. Interferences in an environment with clumpedfood-item distribution

Interferences among robots was defined whenevera robot tried to perform a task, but was hinderedby another robot. The proportion of time spent insuch interferences increased significantly with in-creasing group size (2-way ANOVA, group size:df = 3,1, F= 6.33, P= 0.003). The mean proportion

of time spent in an interference was 0.4%, 0.6%,1.0%, 2.3% for teams of three, six, nine and twelverobots, respectively. However, a post hoc analy-ses revealed (Fisher’s PLSD, 5%) that only thetwelve robot team spent significantly more time ininterferences.

4.4. Effect of group size and food distribution onrobustness

The size of the robot team had a significant ef-fect on the minimal nest-energy recorded (2-wayANOVA, group size: df = 4,1, F= 5.41, P< 0.001).Groups with a larger number of robots had a higherminimal nest-energy (Fig. 13). Except for the twelverobot team, which had a higher minimal nest-energythan the minimal nest-energy recorded for onerobot team but a lower compared to all other groupsizes. The effect of the food distribution on theminimal nest-energy recorded was not significant(2-way ANOVA, food distribution: df = 4,1, F= 3.18,P= 0.079).

Page 15: Silvia task

M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84 79

Fig. 13. Minimal nest-energy recorded during the experiments with no information sharing for the teams of different group size. Theminimal nest-energy was normalised for the number of robots. Error bars indicate the 95% confidence interval.

4.5. Effect of information sharing and group size onrobustness

Information sharing had a positive effect on the ro-bustness of the robot teams. The minimal nest-energyrecorded during the experiment was higher in teamswith information sharing, with an average of 2147and 1870 in the experiments with and without in-formation sharing, respectively. This difference washighly significant (2-way ANOVA, information shar-ing: df = 3,1, F= 11.77, P= 0.001). Again, groupswith a larger number of robots had higher minimalnest-energies with the exception of the twelve robotteams (2-way ANOVA, group size: df = 3,1, F= 3.10,P= 0.034).

4.6. Work skew among the robots

Due to their different activation-thresholds, therobots spent unequal amounts of time in the activeforaging state. This skew in activity among the teammembers was expressed using the skew index. Largergroups had better share-out of their work load indi-cated by the mean work skew of 0.370, 0.345, 0.197and 0.172 for the three, six, nine and twelve robot

teams, respectively. This difference in work skewamong the teams of different group sizes was signifi-cant (2-way ANOVA, group size: df = 3,1, F= 13.03,P< 0.001). In contrast, information sharing had nosignificant effect on the work skew (2-way ANOVA,information sharing: df = 3,1, F= 0.421, P= 0.519).

4.7. Time active

Information sharing decreased significantly thetotal time the robots spent in the active (working)state (2-way ANOVA, information sharing: df = 3,1,F= 6.86, P= 0.009). Robots with information sharingspent on average 48% of their time in an active statewhereas the robots without information sharing 56%.However, this was not true for all team members(Fig. 14). When the robots were categorised accord-ing to their activation-threshold into three thresholdclasses (high, middle and low) the robots with thelowest thresholds spent significantly more time inthe active state as the robots of the same thresholdclass with no information sharing (ANOVA, df = 1,F= 4.76, P= 0.031). This was due to recruitment fromother team members when inactive in the nest.

Page 16: Silvia task

80 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

Fig. 14. Proportion of time spent in an active (working) state depending on the threshold class and information sharing. The robots werecategorised according to their activation-threshold in one of the three classes (high, middle, low). Error bars indicate the 95% confidenceinterval.

5. Discussion

5.1. Threshold control

During our experiments the nest-energy never de-creased below zero. This held under a variety of differ-ent experimental settings such as various group sizes,different food distributions and presence or absence ofinformation sharing. This result suggests that complexsystems as our artificial ant colony can be regulatedwith a single control parameter in a decentralised way;it also indicates that this mode of task allocation canbe favourably used in practical robotics.Apart from its efficiency and simplicity, the

activation-threshold mechanism presents a secondadvantage: it allows to control single as well asgroups of robots. Simply choosing a differentactivation-threshold for each robot results in an au-tomatic adjustment in number of active robots to thecurrent work load. This has positive implications forfuture robot applications. Imagine a system whereseveral robots could work collectively on a given task.Due to financial or other reasons, only one robot isengaged at the beginning. Later, when more robots areadded to the system, no special arrangements besidechoosing different activation-thresholds have to be

made to move from a single to a multi-robot system.Likewise, when the number of robots is reduced eitherby individual failures or by allocation of some robotsto other work areas, no changes have to be made in thecontrol program of the robots. This might even repre-sent a more substantial advantage since robot teamscould be left unsupervised, individual breakdownshaving little effect on such a coordination scheme.As already mentioned, the individual activation-

thresholds were equidistributed between a lowerand an upper value. Using other distributions be-tween these two limits might modify the regula-tory characteristics of the mechanism. Although wedid not pursue the experiments in this direction,we speculate that any reasonable distribution ofactivation-thresholds would yield equivalent results.There is no reason to consider equidistribution best;we chose it for its simplicity. Also, on account of thenoisy context of practical robotics, the exact values ofactivation-thresholds are but of minor interest. Amongother possible activation-threshold distributions, therandom distribution is of special interest since thereis no need to preset the thresholds by an opera-tor. In our experiments, we had to preset a specificactivation-threshold for each robot at the beginning ofthe experiments. In contrast, a random distribution of

Page 17: Silvia task

M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84 81

activation-thresholds would allow to fix a same lowerand upper limit for all robots from which each robotdraws individually its activation-threshold.

5.2. Performance

The size of the robot team had a significant effecton their performance, with teams of three and sixperforming better than single robots or larger groups.This finding is caused by a trade-off between thepositive and the negative effects of robot–robot in-teractions. Robots were programmed to avoid eachother which results in better coverage of the for-aging arena when several robots are present. Thisis because each robot will avoid an area where an-other robot is already foraging. This effect causedsingle robots to perform less efficiently than smallgroups of robots. However, when the number ofrobots increases, those robot–robot encounters havealso negative consequences. First, robots avoidedmore often areas with food-items because on av-erage more robots were already present at loca-tions with food-items. This was especially impor-tant in the environment where the food-items weregrouped in patches. Secondly, robots seized simul-taneously the same food-item more often, causingone of the robots to leave empty handed. Finally,larger groups spent more time in interferences at theentrance or inside the nest (see also Section 4.3).Thus, single robots could not benefit from the posi-tive effect of robot–robot interactions found in smallgroups, whereas in large groups the negative effectof the robot–robot interactions overthrew the positiveones.A second result was that the optimal group size was

different for each of the two food distributions. Thissuggests that under some environmental conditions thebenefit of living in group of a particular size is largewhereas under other conditions the benefit is modest.Observed animal groups’ sizes, which are assumed tohave evolved and optimised, can be correlated to eco-logical factors (e.g. [7,14,17]). Furthermore, the rel-ative increase or decrease in performance from onegroup size to another was different in the two envi-ronments (Fig. 12). For example the decrease in per-formance from a six to a nine robot team was largewith the dispersed food distribution but relative smallwith the clumped food distribution. This indicates that

under some ecological conditions, a change in groupsize is very likely to happen whereas under other con-ditions this shift is unlikely to occur.

5.3. Robustness

The risk of mission failure was measured as theminimal nest-energy recorded during the experiments,since a nest-energy below zero would have ended theexperiment. Our results show that the larger the sizeof the team, the lower the risk of mission failure. Thisresult holds for all group sizes with the exception ofthe twelve robot team. The twelve robot team hada lower risk than the one robot team but a higherrisk than the three, six and nine robot teams (Fig.13). The risk of mission failure however, should notbe confounded with the performance of the system.Performance was measured as the inverse of the totalenergy used and hence measures the teams’ efficiencyof accomplishing the task (keeping the nest-energy ata safe level). In contrast, the risk of mission failureindicates how close the system was to a nest-energyof zero anytime during the experiment. The apparentcontradictory result of the three and six robot teamhaving the best performance but not the lowest risk ofmission failure stems from the fact that smaller teams’nest energies fluctuate more widely.In the experiments with groups of three, six and

nine robots, the nest-energy usually stayed above thelowest activation-threshold (Fig. 11). As a result, atleast one of the robot was still inactive in the nestand hence, the system did not reach its full capacity.A different picture emerged in the experiments withtwelve robots. Even if the final nest-energy neverfell under half the initial nest-energy (Fig. 11 (a)),the energy curve showed in all three experimentalset-ups a tendency to steadily decrease. This suggeststhat in longer experiments the nest-energy wouldhave reached zero and hence the system would havecrashed. From our observations it was the interfer-ences between the robots that made the system lesseffective. Indeed, a detailed analysis of the robot’sbehaviour in the patchy environment (with and with-out information sharing) revealed that the time spentin interferences increased with increasing group size.This shows that a given experimental set-up (sizeof the environment, number of food-items) supportsonly a limited number of robots, around 12 with the

Page 18: Silvia task

82 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

mission we chose for our experiments. This is similarto a phenomenon found in ecology termed carryingcapacity, which is defined as the maximum populationsize that can be supported by a given environment[2].

5.4. Information sharing

Sharing the information of the location of thefood patches by recruitment had a positive effecton performance, resting time and robustness. Theincreased performance was due to the fact that notall robots had to search for their own food patch,which resulted on average in more efficient food re-trievals. Thus, the robot teams needed less time tomaintain the nest-energy. However, this was not thecase for all team members. The robots with a lowactivation-threshold had to work significantly moreunder this scheme since they were more likely to berecruited for work when inactive in the nest. Thisphenomenon is useful since it allows teams organisedwith gradual activation-thresholds to distribute thework load more evenly among their members.The increased robustness originated from the

early recruitment of inactive robots. In the experi-ments without recruitment the robots left the nestonly when the nest-energy was below their personalactivation-threshold. This mechanism caused a largedelay between the signal (low nest-energy) and theresponse (retrieving a food-item) which in turn re-sulted in greater fluctuation of the nest-energy. Thisdelay was much smaller in the experiments with re-cruitment since some robots became already active(by recruitment) at nest energies where they normallywould have stayed inactive in the nest.

5.5. Scaling up

As mentioned above (Section 2.3) informationexchange by tandem recruitment scales up easilywith the number of robots. The same holds for thefixed-threshold dynamical task allocation, especiallyif the activation-thresholds are drawn randomly be-tween two preset limits. Whether the number of activ-ities scales up as easily as the tandem recruitment andthe activation-thresholds remains to be investigated.

5.6. Why not settle for a simulation?

Demonstrating by simulation that a fixed activation-threshold mechanism can properly regulate teams ofsimulated robots is not enough for practical robotics.Computer simulation is a powerful and essential toolfor robotics. Still there is no warranty that an algo-rithm tested with simulation will work practically withrobots. First, real robot–robot and robot–environmentinteractions are more complex and unpredictablethan their simulation. Secondly, a real environment,however simple, is subjected to its own dynamicsand puts the robots up to a set of constraints usuallyimpossible to list and model. Sometimes, especiallywhen the robot’s activities are restricted to movingon a smooth surface, simulation yields excellent re-sults. But as soon as the interactions between robotsand their environment increase, for instance whenthe robots seize objects or when the lighting condi-tions are not constant, practical experiments rapidlydiverge from their simulations. Even if not mentionedby technical articles, nor seen on videos, autonomousrobotics experiments are frequently marked by tech-nical problems. Only mechanisms proven effective insuch difficult environments should be considered. Theultimate test, in experimental robotics, is the practicaltest. In addition, experimentation is rich in unantici-pated interactions which will not occur in simulations,most of them negative, some of them positive. Thelatter can prove useful enough to be integrated intothe mission’s strategy. 7 However, it should be men-tioned that exploitable unexpected phenomena remainrare. Emergences — their noblest variety — do notpop up on request.

5.7. Related work

In her seminal work, Parker [23] studied dynamicaltask allocation in a group of robots whose mission wasto collect pucks in an arena. To choose between the

7 A classical example is the robot programmed to aim at a lightsource, and devoid of an obstacle avoidance mechanism. Versinoand Gambardella [36] tell how, after having inadvertently leftan object between the robot and the light source, they saw therobot turn around it without bumping into it! Explanation: enteringthe shadow cone, the robot automatically turned toward a moreluminous zone to the right or to the left. This emergence wouldnot have taken place in a shadow-less simulation.

Page 19: Silvia task

M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84 83

various tasks involved in their mission, the robots con-sulted their individual “motivation” which integratedfive factors: sensory feedback, inter-robot communica-tion, inhibitory feedback from other active behaviours,robot impatience and robot acquiescence. This generalpurpose architecture, called ALLIANCE, lends itselfto many variations since every motivation factor maybe further developed including learning [24]. In ourresearch, we took a more restrictive approach, usingthe simplest possible mechanism to achieve task al-location. The mechanism we propose functions witha single parameter, the activation-threshold. The onlybroadcasted information used by our robots was theperiodical update of the nest’s energy.The question of labour division in collective

robotics 8 was also studied by Schneider-Fontan andMataric [31] who engaged two, three and four robots inan arena divided in as many equally-sized contiguousregions. Each robot is territorially attached to region i,where it collects and carries the pucks to region i#1,where robot i#1 will then transport them to regioni#2, and so on. Eventually, the pucks reach region 1where robot 1 will carry them to their final destina-tion, a spot in region 1. The goal of this mission was toclean all regions of their pucks and move them all totheir final destination (a spot in region 1). The authorscompare the efficiency of accomplishing this task inrelation to the different group sizes used in their ex-periments. Their best results were achieved with threerobots, which shows “that increased group sizes can,in embodied agents, negatively impact the effective-ness of the territorial solution”. We witnessed a sim-ilar phenomenon in our experiments: the best resultswere achieved with intermediate team sizes. Territo-riality, which “produces a physical division of spaceand all associated resources”, seems to be an effectiveway to reduce interferences between robots [31].

6. Conclusions

From the robotics point of view, we demonstratedthat dispersing the individual activation-thresholds ofrobots is an efficient way of allocating tasks in teamsof real robots.

8 A 15-minutes video entitled “Task allocation in collectiverobotics” depicts and comments our experiments. For copies (pri-vate use only), contact [email protected]

From the biological point of view, we demonstratedthat complex social systems can be regulated in de-centralised way. This adds further evidence to the hy-pothesis that social insect colonies are regulated in aself-organised manner. Moreover, groups of three andsix robot teams had a better performance than a sin-gle robot illustrating that a transition from a solitaryindividual to a social group would have been favouredin our experimental set-up. Even though the relativeadvantage of a given group size depends entirely onthe system we demonstrated that there are favourableconditions where such a transition is possible.

Acknowledgements

We thank Professor J.D. Nicoud’s LAMI (Mi-croprocessors Systems Laboratory, EPFL) for itssustained help, and K-team SA for providing partof the hardware. We thank Laurent Keller, AlcherioMartinoli and Cristina Versino for comments on themanuscript. We are especially grateful to Edo Franzi,André Guignard, Alcherio Martinoli, Philip Maechlerand Christian König. This work was funded by grantsfrom the Fonds UNIL-EPFL 1997 (to J.D. Nicoudand L. Keller) and the “Stiftung für wissenschaftlicheForschung” (to M.J.B. Krieger).

References

[1] R.D. Alexander, The evolution of social behavior, AnnualReview of Ecological System 5 (1974) 325–383.

[2] M. Begon, J.L. Harper, C.R. Townsend, Ecology, Individuals,Populations and Communities, 2nd ed., Blackwell ScientificPublications, MA, 1990.

[3] E. Bonabeau, G. Theraulaz, J.-L. Deneubourg, Quantitativestudy of the fixed threshold model for the regulation ofdivision of labour in insect societies, Proceedings of the RoyalSociety of London B 263 (1996) 1565–1569.

[4] E. Bonabeau, G. Theraulaz, J.-L. Deneubourg, Fixed responsethresholds and the regulation of division of labour in insectsocieties, Bulletin of Mathematical Biology 60 (1998) 753–807.

[5] A.F.G. Bourke, N.R. Franks, Social Evolution in Ants,Princeton University Press, Princeton, NJ, 1995.

[6] M.D. Breed, G.E. Robinson, R.E. Page, Division of labourduring honey bee colony defense, Behavioral Ecology andSociobiology 27 (1990) 395–401.

[7] J.H. Crook, The evolution of social organisation and visualcommunication in the weaver birds (Ploceinae), Behaviour10 (Suppl.) (1964) 1–178.

Page 20: Silvia task

84 M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

[8] J.-L. Deneubourg, J.S. Gross, Collective patterns in decisionmaking, Ethology Ecology and Evolution 1 (1989) 295–311.

[9] J.M. Epstein, R. Axtell, Growing Artificial Societies — SocialScience from the Bottom Up, MIT Press, Cambridge, MA,1996.

[10] D.M. Gordon, Dynamics of task switching in harvester ants,Animal Behaviour 38 (1989) 194–204.

[11] D.M. Gordon, The organization of work in social insectcolonies, Nature 380 (1996) 121–124.

[12] B Hölldobler, M. Möglich, U. Maschwitz, Communicationby tandem running in the ant Camponotus sericeus, Journalof Computational Physiology 90 (1974) 105–127.

[13] Z.Y. Huang, G.E. Robinson, Honeybee colony integration:Worker–worker interactions mediate hormonally regulatedplasticity in division of labor, Proceedings of the NationalAcademy of Sciences USA 89 (1992) 11726–11729.

[14] P.J. Jarman, The social organisation of antelope in relationto their ecology, Behaviour 48 (1974) 215–267.

[15] A. Lenoir, Feeding behaviour in young societies of the antTapinoma erraticum L.: Trophallaxis and polyethism, InsectesSociety 26 (1979) 19–37.

[16] A. Martinoli, E. Franzi, O. Matthey, Towards a reliable set-upfor bio-inspired collective experiments with real robots, in:Proceedings of the ISER’97, Barcelona, Spain, 1997.

[17] T. Maruhashi, C. Saito, N. Agetsuma, Home range structureand inter-group competition for land of Japanese macaquesin evergreen and deciduous forests, Primates 39 (1998) 291–301.

[18] F. Mondada, E. Franzi, P. Ienne, Mobile robot miniaturization:A tool for investigation in control algorithms, in: Proceedingsof the ISER’93, Kyoto, Japan, 1993.

[19] G. Nicolis, I. Prigogine, Self-Organisation in Non-EquilibriumSystems, Wiley, New York, 1977.

[20] S.W. Pacala, D.M. Gordon, H.C.J. Godfray, Effects of socialgroup size on information transfer and task allocation,Evolutional Ecology 10 (1996) 127–165.

[21] E.P. Page, S.D. Mitchel, Self-organization and the evolutionof division of labor, Apidologie 29 (1998) 171–190.

[22] P. Pamilo, R.H. Crozier, Reproductive skew simplified, Oikos75 (1996) 533–535.

[23] L.E. Parker, Heterogeneous multi-robot cooperation, Ph.D.Thesis, MIT, 1994.

[24] L.E. Parker, L-alliance: Task-oriented multi-robot learningin behavior-based systems, Advances in Robotics 11 (1997)305–322.

[25] L. Passera, E. Roncin, B. Kaufmann, L. Keller, Increasedsoldier production in ant colonies exposed to interspecificcompetition, Nature 379 (1996) 630–631.

[26] G.E. Robinson, Modulation of alarm pheromone perceptionin the honey bee: Evidence for division of labour basedon hormonally regulated response thresholds, Journal ofComputational Physiology 160 (1987) 613–619.

[27] G.E. Robinson, Regulation of division of labor in insectsocieties, Annual Review of Entomology 37 (1992)637–665.

[28] G.E. Robinson, R.E. Page, Genetic determination of nectarforaging, pollen foraging, and nest-site scouting in honey beecolonies, Behavioral Ecology and Sociobiology 24 (1989)317–323.

[29] C.H. Ryer, B.L. Olla, Social behavior of juvenile chumsalmon, Oncorhynchus keta, under risk of predation — Theinfluence of food distribution, Environmental Biology andFisheries 45 (1996) 75–83.

[30] T.C. Schelling, Micromotives and Macrobehavior, Norton,New York, 1978.

[31] M. Schneider-Fontan, M. Mataric, A study of territoriality:the role of critical mass in adaptive task division, in:Proceedings of the Fourth International Conference onSimulation of Adaptive Behavior: From Animals to Animats(SAB96), Cape Cod, MA, MIT Press, Cambridge, MA,1996.

[32] T.D. Seeley, Adaptive significance of the age polyethismschedule in honeybee colonies, Behavioral Ecology andSociobiology 11 (1982) 287–293.

[33] A.B. Sendova-Franks, N.R. Franks, Social resilience inindividual worker ants, and its role in division of labour,Proceedings of the Royal Society of London B 256 (1994)305–309.

[34] R.R. Sokal, J.F. Rohlf, Biometry, 2nd ed., Freeman, NewYork, 1981.

[35] G. Theraulaz, E. Bonabeau, Coordination in distributedbuilding, Science 269 (1995) 686–688.

[36] C. Versino, L.M. Gambardella, Learning the visuomotorcoordination of a mobile robot by using the invertible kohonenmap, in: J. Mira, F. Sandoval (Eds.), From Natural to ArtificialNeural Computation: International Workshop on ArtificialNeural Networks, Lectures Notes in Computer Science, Vol.930, Springer, Berlin, 1995, pp. 1084–1091.

[37] J. Watmough, L. Edelstein-Keseht, Modelling the formationof trail networks by foraging ants, Journal of TheoreticalBiology 176 (1995) 357–371.

Micheal B. Krieger obtained his Masterin Biology from the University of Basel,Switzerland. He received his Ph.D. fromthe University of Lausanne, Switzerland in1999. He is currently a Post-Doctoral As-sociate in the Department of Entomologyand Microbiology, University of Georgia,USA, where he continues to work on as-pects of social organisation of ants.

Jean-Bernard Billeter obtained hisdiploma in Electrical Engineering at theSwiss Federal Institute of Technology inZurich (ETHZ). After numerous years outof the academic world, he joined EPELfrom 1994 to 1998 to work on collectiveintelligence. He is currently working onexhibition robotics projects.

Page 21: Silvia task

Efficiency and Robustness of Threshold-Based Distributed Allocation Algorithms in Multi-Agent Systems

William Agassounon Collective Robotics Group

California Institute of Technology Pasadena, CA 91125, USA

+1 (626) 395 2243

[email protected]

Alcherio Martinoli Collective Robotics Group

California Institute of Technology Pasadena, CA 91125, USA

+1 (626) 395 2208

[email protected]

ABSTRACT In this paper we present three scalable, fully distributed, threshold-based algorithms for allocating autonomous embodied workers to a given task whose demand evolves dynamically over time. Individuals estimate the availability of work based solely on local perceptions. The differences among the algorithms lie in the threshold distribution among teammates (homogeneous or heterogeneous team), in the mechanism used for establishing threshold values (fixed, parameter-based or variable, rule-based), and in the sharing (public) or not sharing (private) of demand estimations through local peer-to-peer communication. We tested the algorithms’ efficiency and robustness in a collective manipulation case study concerned with the clustering of initially scattered small objects. The aggregation experiment has been studied at two different experimental levels using a microscopic model and embodied simulations. Results show that teams using a number of active workers dynamically controlled by one of the allocation algorithms achieve similar or better performances in aggregation than those characterized by a constant team size while using on average a considerably reduced number of agents over the whole aggregation process. While differences in efficiency among the algorithms are small, differences in robustness are much more apparent. Threshold variability and peer-to-peer communication appear to be two key mechanisms for improving worker allocation robustness against environmental perturbations.

Categories and Subject Descriptors

J.2 [Computer Applications]: Physical Science and Engineering – engineering, electronics, mathematics and statistics. General Terms: Algorithms, Performance, Reliability, Experimentation. Keywords: Swarm intelligence, division of labor, response threshold, probabilistic modeling, embodied multi-agent systems.

1. INTRODUCTION Swarm Intelligence (SI, first introduced in [2]) is a new computational metaphor for solving distributed problems using

the principles guiding the collectively complex and intelligent behavior of natural systems consisting of many agents, such as ant colonies and bird flocks. The abilities of such systems appear to transcend the abilities of each constituent agent. In all the biological cases studied so far, the emergence of high-level control has been found to be mediated by nothing more than a small set of simple low-level interactions among individuals, and between individuals and the environment [5,7].

Generally speaking, our research focuses on the application of the SI approach to control embedded systems that consist of many autonomous decision making entities endowed with local perception and maybe communication capabilities. In particular, we are interested in understanding task allocation and labor division mechanisms exploited in social insect societies that are suitable for artificial embedded systems such as multiple mobile robot platforms. One of the most appealing principles of the SI approach to a roboticist is the minimalism [3] at the individual agent’s level. This characteristic prompts the roboticist to carefully evaluate each additional capability the single agent should be endowed with, which in turn should lead to an overall increased system robustness and cost effectiveness in mass production.

Recently, several macroscopic models, some of them based on threshold responses [4,16], others focusing only on task-switching probabilities [14], have been proposed to explain these mechanisms in natural colonies. However, none of these theoretical approaches has focused on how workers gather the information necessary to decide whether or not to switch task or to engage in a task performance. More specifically, they have not taken into consideration the partial perception in time and space of the demand and the embodiment of the agents. For instance, partial perceptions of the demand combined with real world uncertainties could strongly influence the optimal distribution of thresholds among teammates or the switching mechanism itself (e.g. probabilistic vs. deterministic).

In the collective robotic literature, we find threshold-based [9,15], market-based, and publish/subscribe messaging approaches [6] that take into account the embodiment of the agents but which are not scalable because of extensive communication requirements in a finite bandwidth or the necessity of an external supervisor. For instance, in the pioneering approach proposed by Parker [15] each robot at every instant of time and in every position is aware of the progress in task accomplishment of its teammates based on a global radio networking and an absolute positioning system. In Krieger and Billeter’s experiment [9] the demand related to the

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AAMAS ’02, July 15-19, 2002, Bologna, Italy. Copyright 2000 ACM 1-58113-480-0/02/0007…$5.00.

1090

Page 22: Silvia task

nest energy is assessed by an external supervisor and globally transmitted to all the robots. Using this method, the team of robots has to be heterogeneous and each agent has to be characterized by a different threshold in order to regulate the activity of the team. This in turn results in a different exploitation of the teammates, the one endowed with the lowest threshold systematically being more active than that with the highest one.

In [1] we proposed a threshold-based, distributed, scalable worker allocation algorithm that is based exclusively on the local estimation of the demand by the individuals. The individuals were all characterized by the same threshold but since the agents did not perceive the demand globally but rather estimate it locally, they did not work or rest all at the same time, a behavior that would have arisen if the demand was broadcasted from an external supervisor. In this paper we propose two new algorithms of the same family and we compare their performances in efficiency and robustness with the one previously presented. Consistently with the SI approach, the two newly proposed algorithms slightly extend the individual capabilities in order to overcome some of the limitations presented by the first algorithm, in particular in case of environmental perturbations. The first new algorithm endows each agent with the ability to calibrate its own response threshold before starting to adapt its activity based on this threshold. We therefore replace here an a priori fixed parameter by a rule that can adapt the value of this parameter according to some locally sensed environmental constraints. The second new algorithm allows the team of embodied agents to exchange individual estimations of the demand through local peer-to-peer communication while still relying on an a priori fixed threshold. Since the local estimation of the demand is noisy, sharing this information among the teammates is a way to increase the update rate of this estimation and reduce the corresponding error without recurring to an external supervisor. Collective embedded systems can be studied at several implementation levels, from macroscopic analytical models [1] to units in real world [10,11] through microscopic, numerical models and embodied simulations. Models allow for a better understanding of the experiment dynamics and for a generalization to other tasks, environmental constraints, and embedded platforms. When using quantitatively accurate models, optimal parameters of the control algorithms can be investigated much more quickly at more abstract levels and the effectiveness of the devised solution can then be verified using embodied simulations and/or real embedded systems. In this paper we present results gathered at two implementations levels using a microscopic model and embodied simulations. Both levels are well suited for studying noisy demand estimations and heterogeneous teams of agents without having to deal in this exploratory phase with all sort of problems arising in real robots experiments. The qualitative and quantitative reliability of both implementation levels for this type of experiments have been shown in previous work [1,8,10,11].

2. EXPERIMENTAL SETUP 2.1 The Aggregation Experiment The case study used for assessing the efficiency of the worker allocation algorithm is concerned with the gathering and clustering of small objects scattered in an enclosed arena. In most of the work done so far [7], and more specifically [10,11], the size

of the working team was kept constant during the whole aggregation process. These experiments define our baseline for an efficiency comparison with and without worker allocation algorithm. In this paper, we are using three primary team performance measurements: the average cluster size, the average number of clusters, and the average number of active workers in the environment. We then integrate all three primary team performances in a combined metric that represents the cost of aggregation over a certain time period.

2.2 The Embodied Simulator We implemented the aggregation experiment in Webots 2.01, a 3D sensor-based, kinematic simulator [12] of Khepera robots [13]. The simulator computes trajectories and sensory inputs of the embodied agents in an arena corresponding to the physical setup (see Figure 1).

The mean comparative speed ratio for this experiment with 10 robots between Webots and real time is about 7 on a PC Pentium III 800 MHz workstation. The environment is represented by two square arenas of different sizes. Each of them contains a working zone of a corresponding size (80x80 cm and 178X178 cm) where twenty small seeds are randomly scattered at the beginning of the experiment. A resting zone surrounds the working zone, where non-active agents go to rest (or stay in an idle state to save energy). Agents are endowed with sensor capabilities to distinguish the border between resting and working zones. Without considering the mode-switching behavior (explained in section 3), we can summarize each agent’s behavior with the following simple rules. In its default behavior the agent moves straight forwards within the working zone looking for seeds. When at least one of its six frontal proximity sensors is activated, the agent starts a discriminating procedure. Two cases can occur: if the agent is in front of a large object (a wall, another agent, or the body side of a cluster of seeds), the object is considered as an obstacle and the agent avoids it. In the second case, the small object is considered as a seed. If the agent is not already carrying a seed, it grasps this one with the gripper, otherwise it drops the seed it is carrying close to that it has found; then in both cases, the

Figure 1 a. Close up of a simulated robot (5.5 cm in diameter) in Webots equipped with a gripper turret in

front of a seed.

1091

Page 23: Silvia task

agent resumes searching for seeds. With this simple individual behavior, the team is able to gather objects in clusters of increasing size. A cluster is defined as a group of seeds whose neighboring elements are separated by at most one seed diameter. Note that, because agents identify only the two extreme seeds of a cluster as seeds (as opposed to obstacles), clusters are built in line.

As shown in [11], if the agents do not drop a seed unless it is next to another seed or pick up an internal seed of a cluster, the number of clusters is monotonically decreasing and eventually a single cluster always arises.

2.3 The Microscopic, Probabilistic Model The central idea of the microscopic, probabilistic model is to describe the experiment as a series of stochastic events with probabilities based on simple geometrical considerations and systematic interaction experiments with a single real or embodied agent. The probability for any agent to encounter any other object present in the arena (e.g. a seed, a teammate, the border between the working field and the resting zone, etc.) is given by the ratio of the extended area occupied by that object to the total arena area in which the agent is moving. The extended area occupied by each object is computed by considering the detection range of that object by an active agent taken from the center of that agent. In this specific collective manipulation case study, seed picking up and dropping probabilities have also to be taken into account once a cluster is found and they depend on the angle of approach of the agent to the cluster (clusters can be modified only at their tips).

In the numerical, probabilistic model a finite state machine defines the states of the agents, but instead of computing the detailed sensory information and trajectories of the agents the change of states is determined randomly by rolling a dice. The overall behavior is then computed by averaging the results of several runs of the same experiment. A more detailed description of this microscopic modeling methodology can be found in [8,10,11].

Working with models also brings an additional time saving in comparison to embodied simulations. The mean speed ratio for this experiment with 10 agents between the microscopic probabilistic model and Webots is about 700 on a PC Pentium III 800 MHz workstation.

3. THE DISTRIBUTED WORKER ALLOCATION MECHANISM The main objective of this case study is to show that the introduction of worker allocation mechanisms allows the team of agents to increase its efficiency as a whole by allocating the right number of workers as a function of the demand intrinsically defined by the aggregation process. Intuitively, we can imagine that at the beginning of the aggregation there are several possible manipulation sites (i.e. several scattered seeds) that allow for a parallel work of several agents. As the aggregation process goes on, the number of these sites is reduced and having more agents competing for the same manipulation sites decreases their efficiency.

In threshold-based systems, the ‘propensity’ of any agent to act is given by a response threshold. If the demand is above the agent’s threshold then that agent continues to perform the task, conversely if the demand is below its threshold then the agent stops performing that particular task. In all the algorithms presented in this paper the time an agent spends before finding some work to accomplish (i.e. to pick and drop a seed) represents the agent’s estimation of the demand stimulus associated with the aggregation task.

Our current worker allocation mechanism is as follows. When an agent has not been able to work (i.e. to pick up and drop a seed) for a reasonable amount of time, its propensity to accomplish that particular task is decreased. If the stimulus goes below a certain threshold (i.e. if the amount of time spent in the search for work to accomplish is above a given Tsearch time-out), a deterministic switching mechanism prompts the agent to leave the working zone for resting in the adjacent parking space. An agent carrying a seed that decides to become inactive cannot do so until it finds an appropriate spot (i.e. one tip of a cluster) to drop the seed. Thus, with this simple algorithm characterized by a single threshold, each agent is able to estimate the aggregation demand locally and to decide whether to work or rest.

In the following, ‘public’ refers to the existence of an explicit, collaborative information flow between the teammates, and ‘private’, to no information sharing at all. Thus, in what follows we present three different worker allocation algorithms: a private, fixed-threshold algorithm, a private, variable-threshold algorithm, and a public, fixed-threshold algorithm.

3.1 The Private, Fixed-Threshold Worker Allocation Algorithm (PrFT) The PrFT algorithm is the same reported in [1]. Following the worker allocation mechanism described previously, we assign the same response threshold to all the agents. The team of agents is therefore homogeneous from control point of view. The resulting agents’ behavior (rhythm of activity) is not identical since it is based on the local, private assessment of the current status of the shared resource, i.e. the environment. In other words, diversity in

Figure 1 b. Experimental setup: inner area corresponds to the working zone, and outer area is the resting zone.

Aggregation in progress with 10 agents in a 178X178 cm arena.

1092

Page 24: Silvia task

activity is created by exploiting the intrinsic noise of the system as well as local perceptions and interactions.

3.2 The Private, Variable-Threshold Worker Allocation Algorithm (PrVT) Using fixed-threshold algorithms does not endow the team of workers with the robustness required to face external perturbations brought to the system. For instance, if some key control parameters such as the response thresholds have to assume different values as a function of the characteristics of the environment (e.g. arena size) for an optimal team performance, an algorithm is required that automatically calculates the optimal parameters for different environmental constraints.

In this paper, we propose a variable-threshold algorithm based on a threshold self-calibration rule that works in two steps: a threshold estimation phase followed by a worker allocation phase. During the estimation phase, each autonomous agent evaluates the spatial density of the demand and then sets its response threshold based on that individual estimation. During the allocation phase, the algorithm works as explained above. Two parameters govern the self-calibration mechanism: the Estimation Steps (ES) and the Estimation Factor (EF). Each autonomous agent estimates the availability of work in the environment by averaging the amount of time it spends to find some work to accomplish over its first ES successful attempts. The agent then computes its response threshold by multiplying that average amount of time by EF. Notice that even if all the agents are characterized by the same values of ES and EF and therefore the same calibration rule (homogeneous team), due to their partial perceptions in time and space the agents will end up with different thresholds in the allocation phase. In addition, since PrVT is a private algorithm, the transition time from one phase to the other is also determined by the agents individually and asynchronously. Equation 1 summarizes how each autonomous agent computes its response threshold. TW

k represents the amount of time an agent spent to find work to accomplish at its kth successful attempt.

3.3 The Public, Fixed-Threshold Worker Allocation Algorithm (PuFT) In order to allow this multi-agent system to react to dynamic external perturbations (e.g. sudden introduction of additional seeds) more quickly than relying on the slow implicit communication through environmental modifications, we endow all the agents with peer-to-peer communication capabilities. An additional advantage of explicit communication is that each agent is able to gather information about the work demand both from its individual experience and the experience of other teammates that it may encounter. Our current collaborative scheme is as follows. When two agents are within a certain Communication Range (CR), they exchange their estimation of the demand; their individual estimations are then set to the average of their original values. In the PuFT algorithm presented in this paper, the

distribution of CR and Tsearch is homogeneous and fixed a priori among the teammates.

Figure 2. Example of two agents within communication range.

4. RESULTS AND DISCUSSION In this section we present and compare results collected at two different experimental levels using microscopic modeling and embodied simulations. Unless otherwise stated, each aggregation run lasted 10 hours (simulated time), for each team size 100 and 30 runs were carried out using the microscopic model and the embodied simulator respectively, and error bars represent the standard deviations among runs. All results reported below were obtained without using any free parameters. All the parameters introduced in the model (e.g. mean obstacle avoidance duration, mean time to pick up/drop a seed, mean time to leave the working zone) were measured from a single embodied agent.

4.1 Aggregation Without Worker Allocation Figure 3 presents the model predictions and the embodied simulation results of the aggregation experiment without the use of any worker allocation algorithm using a group of 10 agents in an 80X80 cm arena. In that plot the upper set of curves represents the (increasing) average size of the clusters over time while the other set shows the (decreasing) average number of clusters over time. Figure 3 consists of a first phase when the average cluster size increases steadily from 1 seed to about 15 seeds and a second phase when the average cluster size remains on average constant around 15 seeds. Similarly, during the first phase the average number of clusters decreases asymptotically from 20 to about 1 then remains close to 1 during the second phase of the aggregation process. This can be explained by the fact that, once a single cluster arises only two manipulation sites remain in the environment (i.e. the two end tips of that cluster). The probabilities of picking up and dropping a seed are empirically very close, therefore at any given time during the last phase of the aggregation process, on average, half of the active workers will be carrying a seed and the other half will not. A similar aggregation evolution was recorded when using an arena of size 178X178 cm. The main difference lies in the time needed to reach a single cluster, which takes on average twice as long in the larger arena. Note that the latter arena has five times the surface of the 80X80 cm arena.

1

(1)1

ksearch

ES

Wk

EFT TES =

= !" #$ %& '(

1093

Page 25: Silvia task

The good agreement between the results collected at both implementation levels shows how reliable the microscopic model’s predictions are. Additional results for the same case study have been reported in [1]. Therefore, in the following, we will be presenting results obtained using the microscopic model exclusively.

4.2 Integrated Cost and Optimization of Algorithmic Parameters To be able to optimize the different algorithmic parameters and to access the cost effectiveness of these different algorithms, we introduced a cost function whose integrated value over the total observation time, named Integrated Cost (IC), corresponds to the total cost of the experiment. The IC represents an efficient combined metric for comparing the influence of each parameter of any given worker allocation algorithm, as well as the performances of different algorithms. The cost function is defined in Equation 2.

Where:

• xt, yt and zt represent the average cluster size, the average number of clusters, and the average number of active workers at time step t respectively.

• Xopt, Yopt and Zopt are the optimal values of the variables above (e.g. 20, 1, and 0 respectively).

• !x, !y and !z are coefficients selected to weight the contribution of each of the variables accordingly.

In the right-hand side of Equation 2, the first two terms can be considered as the ‘penalty cost’ (e.g. the cost for work not finished) and the third term can be seen as the ‘worker cost’ (e.g. cost for hiring workers). The Integrated Cost is computed by using a discrete Riemann integration over any observation time. Results presented in Table 1 were obtained for an observation

time of 10 hours. We chose !x = 10-2 and !x = !y = 4!z in order to obtain a higher penalty cost for unsuccessful aggregation results (cluster size and number of clusters) than a worker cost. In Table 1 we optimized PuFT for a fixed communication range of 20 cm.

In this paper, we use the IC for two purposes: first, as a metric to optimize parameters of the worker allocation algorithms; second, as a metric to compare the performances of the worker allocation algorithms under different environmental constraints. Additionally, the following criterion based on central tendencies (mean and standard error) is used in order to assess the quality of a given set of parameters on the IC-axis.

Assume that (Ei,"i) represents the pair of mean and standard error of the IC associated with the set of simulation runs i. We consider the IC achieved by the set i as better (more efficient) than that of the set j if Ei+(1+#)"i < Ej-(1+#)"j, where # is a criterion parameter. For the results reported here we used # = 0. In the case where both sets respect the inequality above, we consider them as equivalent. Considering the acceleration factor of the probabilistic model and the limited number of design points in a given parameter space, we performed a systematic test of all possible parameter combinations for a given algorithm.

Table 1 summarizes the algorithmic parameters and the optimized values for the three worker allocation algorithms in an 80X80 cm arena. Note that in Equation 2 we do not take into account the extra power consumption cost introduced by the communication capability each agent is endowed with in PuFT.

Table 1. Algorithmic Parameters

Algorithm Para-meter

Range (min-max)

Granularity

Optimal value

PrFT Tsearch 10-35 min. 5 min. 25 min.

PrVT ES EF

5 – 40 0.5 - 10

5 0.5

10 8

PuFT Tsearch 10-35 min. 5 min. 15 min.

4.3 Aggregation with Worker Allocation In the following, we present a set of comparative results obtained by applying the worker allocation algorithms detailed in section 3 to the same aggregation experiment. We started by optimizing their respective parameters (as explained in section 4.2) for a same experimental setup (e.g. an 80X80 cm arena, 20 seeds originally scattered in the arena, and 10 active agents at the beginning of the experiment). We then proceeded by studying the influence of external perturbations on the cluster formation (e.g. different arena size, 20% additional seeds introduced during the aggregation process) for the different algorithms. Table 2 presents the IC values and their standard errors for the three worker allocation algorithms in the different manipulation experiments introduced previously (values in bold corresponds to the best performance, i.e. minimal cost, calculated by applying the criterion mentioned above). In Table 2, Arena1 corresponds to the original setup (an 80X80 cm arena and 20 seeds). Arena2 corresponds to the case with static perturbation (20 seeds in a new 178X178 cm arena), and Arena3 refers to the case with dynamic

Figure 3. Results of the aggregation experiment without worker allocation using a team of 10 agents.

2 2

2

cos

(2)

( , , ) ! ( ) ! ( )

! ( ) t t t x opt t y opt t

z opt t

tF x y z X x Y y

Z z

= ) + )

+ )

1094

Page 26: Silvia task

perturbations (80X80 cm arena, 20 original seeds and 5 additional seeds introduced during the aggregation process 2 hours after the start).

Table 2. Integrated cost

Algorithm Arena1 Arena2 Arena3 PrFT 138.9±7.0 324.9 ± 10.8 154.5 ± 7.9 PrVT 155.1 ± 8.0 231.9 ± 10.7 152.2 ± 8.7 PuFT 138.2 ± 6.9 337.6± 10.7 122.4± 6.4

W/o WA 227.4 ± 4.8 310.8 ± 8.8 197.2 ± 5.9 The PuFT and PrFT algorithms appear to be the most efficient in Arena1 (i.e. equivalent performances following the criterion), PrVT in arena2, and PuFT in Arena3.

4.3.$. Private, Fixed-Threshold Worker Allocation Figure 4 shows the outcome of the aggregation experiment using the worker allocation algorithms with a team of 10 agents in an 80X80 cm arena. Figure 4 shows that here, conversely to the case without worker allocation, during the last phase of the aggregation, the average cluster size remains an increasing function of time eventually reaching 20 seeds, the optimal largest value possible, while the number of active workers in the

environment decreases. Intuitively, this can be explained by the fact that with only two manipulation sites remaining in the arena, and on average half of the active agents always carrying a seed and the other half not, reducing the number of active agents, consequently increases the size of the single cluster.

However, due to the a priori fixed response threshold value the agents behave sub-optimally in a different environment. For instance, when performing the same aggregation task in a 178X178 cm arena, the average size of the clusters they create are

smaller on average than the average size of those created by the team using the PrVT algorithm (with similar standard deviations) because the agents withdraw too soon. This is illustrated in Figure 5 where after 120 minutes, the size of the clusters created using the PrFT algorithm becomes (and remains for the rest of the experiment) distinctively smaller on average than that of the clusters created using the PrVT algorithm. As a consequence, the aggregation efficiency of the PrFT algorithm deteriorated considerably in Arena2 as shown in Table 2.

4.3.2. Private, Variable-Threshold Worker Allocation

The density of manipulation sites (seeds that can be manipulated) is higher in the smaller arena and the robots are more likely to encounter them than in the larger arena. In response to this difference in density of manipulation sites, variable-threshold workers autonomously set their response thresholds higher in Arena2. Therefore, they stay active longer in the larger arena than in the smaller and this in turn allows them to continue performing the task, as most seeds are not gathered yet into a single cluster. This is illustrated by Figure 5 where it clearly appears that PrFT and PuFT under-perform due to a relatively too low homogeneous threshold value and their inability to adapt to a new environment.

However, the PrVT algorithm is not appropriate for an optimal response of the agents to a dynamic change in the number of objects to manipulate. For instance, results in Table 2 show that when additional seeds are dropped in the arena after 2 h into the aggregation process, the efficiency of the PrVT algorithm deteriorates. This results from the nonexistence of a continuous adaptive activity threshold mechanism that allows the agents to upgrade their activity thresholds when facing a sudden increase in the availability of work.

Figure 4 b. Average number of active workers for aggregation experiment with worker allocation

algorithms in an 80X80 cm arena

Figure 4 a. Average cluster size for aggregation experiment with worker allocation algorithms in an

80X80 cm arena

1095

Page 27: Silvia task

4.3.3. Public, Fixed-Threshold Worker Allocation PuFT can efficiently deal with a dynamic change in the number of objects to manipulate in a distributed way. However, because the threshold is fixed, the PuFT algorithm does not allow the agents to respond efficiently to all modifications of the arena surface as shown in Figure 5 and Table 2. Nevertheless, this algorithm has the advantage of providing the team of autonomous agents with the ability to quickly access the information about dynamic changes brought to the working environment through its peer-to-peer communication scheme. As shown in Figure 6, the sudden increase in the availability of work (i.e. 20% additional seeds dropped in the environment) was quickly sensed by the teammates. This results in an appropriate number of agents

staying active to accomplish the task throughout the aggregation process. This in turn (based on the explanation in section 4.1) contributes to the collaborative team achieving larger clusters (see

Figure 6) as well as to a faster decrease in the activity of the robots as soon as a single cluster arises. Consequently, the PuFT algorithm offers the best efficiency in Arena3 (see Table 2 and Figure 7). Figure 7 presents the IC for the different allocation algorithms as a function of the observation time i.e., the integration time interval in Arena3. Error bars represent the standard errors among sets of simulation runs. From Figure 7, it appears that despite the dynamic change in the workload the algorithms with worker allocation are still more efficient than that without any worker allocation. The PuFT algorithm offers the best efficiency after the introduction of the additional seeds by exploiting teammates’ collaboration via peer-to-peer communication.

5. CONCLUSION In this paper, we have presented a comparative study of three scalable, distributed, threshold-based worker allocation algorithms that allow a team of autonomous, embodied agents to dynamically allocate an appropriate number of workers to a given task based solely on their individual estimations of the progress in the execution of the task. We compared their efficiency and robustness in a collective manipulation case study concerned with gathering and clustering of seeds.

Teams consisting of active workers dynamically controlled by one of the allocation algorithms achieve similar or better performances in aggregation than those characterized by a constant team size while using on average a considerably reduced number of agents over the whole aggregation process. Moreover, after a systematic optimization process in a given environment common to all the algorithms, it appeared that quantitative differences in efficiency among these allocation algorithms are less apparent. Although PuFT appears to be one of the two most efficient algorithms in the smaller arena, its energy cost due to communication was not included in these experiments. In addition, even if the demand can be estimated more accurately by sharing information and the collective reaction to external perturbations is faster, the PuFT algorithm still suffers from the same drawback as the PrFT algorithm in facing environmental changes for which its threshold was not a priori optimized for.

Figure 5. Average cluster size for aggregation experiment with worker allocation algorithms in a

178X178 cm arena

Figure 6. Average cluster size for aggregation experiment with worker allocation algorithms in an 80X80 cm arena. 2 and 3 more seeds dropped in the

arena after 2h and 4 h into the experiment respectively

Figure 7. Integrated cost as a function of observation time for aggregation experiment in Arena3

1096

Page 28: Silvia task

We believe that combining the characteristics of PuFT and PrVT, threshold variability and information sharing will allow us in the near future to further improve the robustness (and maybe the efficiency) of the resulting worker allocation algorithm. In particular, transforming the calibration rule in a continuously adapting algorithm as suggested in [16] seems to be a promising solution to static and dynamic external perturbations. Finally, it is worth noting that an agent that withdraws from a task can allocate itself to a different task following a similar response-to-stimulus mechanism, thus making this set of algorithms easily applicable to multi-task problems and more complex labor division schemes.

6. ACKNOWLEDGEMENTS Special thanks to Xiaofeng Li for helping to collect systematic simulation data. This work is supported in part by the TRW Foundation and the TRW Space and Technology Division. Further funding was received from the Caltech Center for Neuromorphic Systems Engineering as part of the NSF Engineering Research Center program under grant EEC-9402726.

7. REFERENCES [1] Agassounon, W., Martinoli, A., and Goodman, R. A

Scalable, Distributed Algorithm for Allocating Workers in Embedded Systems. In Proceedings of IEEE SMC ’01 Tucson AZ, October 2001), 3367-3373.

[2] Beni, G., and Wang, J. Swarm Intelligence. In Proceedings of the 7th Annual Meeting, Robotics Society of Japan, 1989, 425-428.

[3] Böhringer, K., Brown, R., Donald, B., Jennings, J., and Rus, D. Distributed Robotic Manipulation: Experiments in Minimalism. Proceedings of ISER ’95 (Stanford, CA, June 1995), 11-25.

[4] Bonabeau, E., Theraulaz, G., and Deneubourg J.-L. Fixed Response Thresholds and Regulation of Division of Labour in Insect Societies. Bulletin of Mathematical Biology, 1998, Vol. 60, 753-807.

[5] Bonabeau, E., Dorigo, M., and Theraulaz, G. Swarm Intelligence: From Animal to Artificial Systems. SFI Studies in the Science of Complexity. Oxford University Press, New York NY, 1999.

[6] Gerkey, B. P., and Mataric, M. J. MURDOCH: Publish/Subscribe Task Allocation for Heterogeneous

Agents. In Proceedings of Autonomous Agents’00, Barcelona, Spain, June 2000, 203-204.

[7] Holland, O., and Melhuish, C. R. Stigmergy, Self-organization, and Sorting in Collective Robotics. Artificial Life, 1999, Vol. 5, 173-202.

[8] Ijspeert, A. J., Martinoli, A., Billard, A., Gambardella L. M. Collaboration through the Exploitation of Local Interactions in Autonomous Collective Robotics: The stick Pulling Experiment. Autonomous Robots, 2001, Vol. 11, No. 2, 149-171.

[9] Krieger, M. B., and Billeter, J.-B. The Call for Duty: Self-Organized Task Allocation in Population of up to Twelve Mobile Robots. Robotics and Autonomous Systems, 2000, Vol. 30, No. 1-2, 65-84.

[10] Martinoli, A., Ijspeert, A. J., and Mondada, F. Understanding Collective Aggregation Mechanisms: From Probabilistic Modeling to Experiments with Real Robots. Robotics and Autonomous Systems, 1999, Vol. 29, 51-63.

[11] Martinoli, A., Ijspeert, A. J., and Gambardella, L. G. A Probabilistic Model for Understanding and Comparing Collective Aggregation Mechanisms. In Proceedings of ECAL ’99 (Lausanne, Switzerland, September 1999), 575-584.

[12] Michel, O. Webots: Symbiosis Between Virtual and Real Mobile Robots. Proceedings of ICVW ‘98 (Paris, France, 1998), 254-263.

[13] Mondada, F., Franzi, E., and Ienne, P. Mobile Robot Miniaturization: A Tool for Investigation in Control Algorithms. Proceeding of ISER ’93 (Kyoto, Japan, October 1993), 501-513.

[14] Pacala, S. W., Gordon, D. M., and Godfray, H. C. J. Effects of Social Group Size on Information Transfer and Task Allocation. Evolutionary Ecology, 1996, Vol. 10, 127-165.

[15] Parker, E. L. ALLIANCE: An Architecture for Fault Tolerant Multi-robot Cooperation. IEEE Transactions on Robotics and Automation, 1998, Vol. 14, No. 2, 220-240.

[16] Theraulaz, G., Bonabeau, E. and Deneubourg, J.-L. Response Threshold Reinforcement and Division of Labour in Insect Societies. In Proceedings of Royal Society of London, Series B, 1998, Vol. 265, 327-332.

1097

Page 29: Silvia task

Submitted to Proceedings of the IEEE/RSJ International Conference on Robotics and Intelligent Systems (IROS-03), 2003

Adaptive Division of Labor in Large-Scale Minimalist Multi-Robot SystemsChris Jones and Maja J. MataricComputer Science DepartmentUniversity of Southern California

941 West 37th Place, Mailcode 0781Los Angeles, CA 90089-0781cvjones maja @robotics.usc.edu

Abstract

A Large-Scale Minimalist Multi-Robot System (LMMS)is one composed of a group of robots each with limitedcapabilities in terms of sensing, computation, and commu-nication. Such systems have received increased attentiondue to their empirically demonstrated range of capabilitiesand beneficial characteristics, such as their robustness toenvironmental perturbations and individual robot failureand their scalability to large numbers of robots. However,little work has been done in investigating ways to endowsuch a LMMS with the capability to achieve a desireddivision of labor over a set of dynamically evolvingconcurrent tasks, a necessity in any task-achieving LMMS.Such a capability can help to increase the efficiencyand robustness of overall task performance as well asopen new domains in which LMMS can be seen as aviable alternative to more complex control solutions. Inthis paper we present a method for achieving a desireddivision of labor in a LMMS, experimentally validate itin a realistic simulation, and demonstrate its potential toscale to large numbers of robots and its ability to adaptto environmental perturbations.

I. Introduction and Motivation

A Large-Scale Minimalist Multi-Robot System(LMMS) is a multi-robot system composed of a largenumber of robots, each having limited capabilities in termsof sensing, computational power, and communicationrange and bandwidth. We define a minimalist robot as onewhich maintains little or no state information, extractslimited, local, and noisy information from its availablesensors, and lacks the capability for active communicationwith other robots. Due to these limited capabilities, theworld in which a minimalist robot is situated is formallypartially-observable and highly non-stationary, and it istherefore not practical to assume that such a robot iscapable of reliably knowing a significant portion of thecurrent global state of the environment or of overall taskprogress.

These limitations in sensing, communication, and com-putation preclude a minimalist robot from performingtasks requiring significant computation or communicationcapabilities. Nonetheless, minimalist robots have beenshown to be highly effective at a number of collectivetasks, such as multi-robot formation control (Fredslund& Mataric 2002), collection tasks (Goldberg & Mataric2002), and robotic soccer (Werger 1999). A system com-posed of a large number of such minimalist robots hasthe potential of conferring advantages including increasedrobustness to individual robot failure as no single robotis critical to task performance, the prospect of scaling toincreasingly larger numbers of robots as there are fewbottlenecks in terms of complex communication, planning,or coordination requirements, and increased adaptability tochanges in the environment since individuals act based onlocal information and are not tied to globally coordinatedplans.

The aim of this work is to investigate a method bywhich to endow a LMMS with the capability to achievea desired division of labor over a set of dynamicallyevolving concurrent tasks, a critical requirement of anytask achieving large-scale multi-robot system. We definedivision of labor as the phenomenon in which individualsin a multi-robot system concurrently execute a set oftasks. The division of labor may need to be continuouslyadjusted in response to changes in the task environment orgroup performance. The broader scope of this work is inunderstanding ways in which to achieve robust, scalable,and efficient coordination in a LMMS.

This paper is organized as follows. In Section II weprovide the relevant related work. In Section III we givea detailed description of the concurrent foraging taskdomain we use as validation of our division of labormechanism. In Section IV we describe our experimentalsetup used for empirical evaluation of our work. In Sec-tion V we present the robot controller we use to producea division of labor in a LMMS. In Section VI we describeand analyze experimental results, and in Section VII wedraw conclusions.

Page 30: Silvia task

II. Related Work

Here we summarize briefly the related work in physicalLMMS using robots with similar capabilities to thoseon which our system is based. Mataric (1995) providesearly work on group coordination in LMMS using acollection of simple basis behaviors. Theraulaz, Goss,Gervet & Deneubourg (1991) and Agassounon & Marti-noli (2002) present minimalist methodologies for coordi-nation in robot groups. Beckers, Holland & Deneubourg(1994) demonstrate the capabilities of minimalist multi-robot systems in object clustering and sorting. Kube &Zhang (1996) present an approach to box-pushing using agroup of robots with simple sensors and reactive control.Werger & Mataric (1996) present a minimalist solutionin the multi-robot foraging domain. Martinoli, Ijspeert& Mondada (1999) present work on the probabilisticmodeling of robot behavior in the task regulation domain,demonstrating its performance as compared to experi-ments on physical and simulated robots. Werger (1999)presents coordinated behavior in a robot soccer teamusing a minimalist behavior-based control system. Krieger& Billeter (2000) present a decentralized task allocationmechanism for large mobile robot groups based on indi-vidual task-associated response thresholds in a collectiondomain. Holland & Melhuish (2000) use probabilisticbehavior selection in minimalist robotic clustering andsorting. Goldberg & Mataric (2002) precisely define theforaging task for LMMS and provide a collection ofgeneral distributed behavior-based algorithms and theirempirical evaluation. Fredslund & Mataric (2002) presentwork on the problem of achieving coordinated behaviorin the context of formations using a distributed groupof physical robots using only local sensing and minimalcommunication.In the multi-robot literature, there is work on more

communication and computationally complex forms oftask regulation in multi-robot systems through the useof publish/subscribe and market-based methods (Gerkey& Mataric 2002) and systems in which significant globalstate is made known to all robots (Parker 1998).There is related work in the area of research that

studies and simulates insect colonies and their behaviors.Theraulaz, E. & Deneubourg (1998) describe how theadaptability of complex social insect societies is increasedby allowing members of the society to dynamically changetasks (behaviors) when necessary. Giving that ability torobots allows a LMMS to operate in domains requiringthe simultaneous regulation of many tasks. Bonabeau,Theraulaz & Deneubourg (1996) describe a model ofa task regulation mechanism in insect societies throughthe use of response thresholds for task-related stimuli.Theraulaz et al. (1998) extended that model by introducingan adaptive threshold that changes over time based on

individual task performance.The division of labor mechanism we present can be

considered an instance of a response threshold model aspresented in Bonabeau et al. (1996), Krieger & Billeter(2000), Theraulaz et al. (1998), and Agassounon & Mar-tinoli (2002). However, our task domain and division oflabor mechanism differ in that the task-related stimuli areperceived locally by the individual robots and are notaltered as a result of task performance. Furthermore, theindividual robots are initially homogeneous, as opposedto Krieger & Billeter (2000) in which robot are initiallyassigned different response thresholds, and the robots donot learn or become specialized through adaptive responsethresholds as is the case in Theraulaz et al. (1998) andAgassounon & Martinoli (2002).

III. Concurrent Foraging Task Domain

In order to experimentally study a mechanism forproviding a LMMS with division of labor capabilities, weinvestigated division of labor in a concurrent foraging taskdomain. Concurrent foraging, a variation on traditionalforaging, consists of an arena populated by multiple typesof objects to be collected. Each robot is equally capableof foraging all object types, but can only be allocatedto foraging for one type at any given time. Additionally,all robots are engaged in foraging at all times (i.e., arobot cannot be idle). A robot may switch the object typeaccording to its control policy, when it determines it isappropriate to do so. It is desirable for a robot to avoidthrashing (i.e., wasting time and energy) by needlesslyswitching the object type for which it is foraging.

A. Task Description

Our experimental domain of concurrent foraging re-quires multiple object (puck) types to be foraged from acircular arena. Initially, the arena is randomly populatedby two types of pucks: Puck and Puck , whichare distinguishable by their color.In this task, the robots move around an enclosed arena

and pick up encountered pucks. When a robot picks upa puck, the puck is consumed (i.e., it is immediatelyremoved from the environment, not transported to anotherregion) and the robot carries on foraging for other pucks.Immediately after a puck is consumed, another puck of thesame type is placed in the arena at a random location. Thereason for this replacement is to maintain constant puckdensity in the arena through the course of an experiment.In some situations, the density of pucks can have anaffect on the division of labor performance. This is animportant consideration in mechanisms for division oflabor in LMMS for many domains; however, in this work

Page 31: Silvia task

we want to limit the number of experimental variablesimpacting system performance. Therefore, we reserve theinvestigation on the impact of varying puck densities ondivision of labor in LMMS for future work.The division of labor portion of the task requires the

robots to split their numbers by having some forage forPuck pucks and others for Puck pucks. For thepurpose of our experiments, we desire a division of laborsuch that the proportion of robots foraging for Puckpucks is equal to the proportion of Puck pucks presentin the foraging arena (e.g., if Puck pucks make up 30%of the pucks present in the foraging arena, then 30% of therobots should be foraging for Puck pucks). In general,the desired division of labor could take other forms. Forexample, it could be related to the relative reward orcost of foraging each puck type without change to ourapproach.As was stated earlier, due to their minimalist capabili-

ties, individual robots do not have direct access to globalinformation such as the size and shape of the foragingarena, the initial or current number of pucks to be foraged(total or by type), or the initial or current number offoraging robots (total or by foraging type). Also, it cannotbe assumed that any robot or subset of robots will alwaysbe operational or the proportion of pucks will remainconstant over time.

IV. Simulation Environment

All simulations were performed using Player and Stage.Player (Gerkey, Vaughan, Støy, Howard, Sukhatme &Mataric 2001) is a server that connects robots, sensors, andcontrol programs over the network. Stage (Vaughan 2000)simulates a set of Player devices. Together, the two rep-resent a high-fidelity simulation tool for individual robotsand robot teams which has been validated on a collectionof real-world robot experiments using Player and Stageprograms transferred directly to physical Pioneer 2DXmobile robots.The experimental arena (a close-up is shown in Fig-

ure 1) was used in all experiments presented in this paper.The arena, shown populated with robots and pucks, iscircular and has an area of approximately 315 squaremeters.

V. The Robots

The robots used in the experimental simulations arerealistic models of the ActivMedia Pioneer 2DX mobilerobot. Each robot, approximately 30 cm in diameter, isequipped with a differential drive, an odometry system

Fig. 1. Close-up of the arena used in experiments; example pucks androbots are shown.

using wheel rotation encoders, 8 evenly spaced sonars cov-ering the front 180 degrees used for obstacle avoidance,and a forward-looking Sony color camera with a 60-degreefield-of-view and a color blob detection system (usedfor puck and robot detection and classification throughcolor). Each robot is also equipped with a 2-DOF gripperon the front, capable of picking up a single puck at atime. There is no capability available for explicit, directcommunication between robots nor can pucks and otherrobots be uniquely identified.

A. Behavior-Based Controller

All robots have identical behavior-based controllersconsisting of the following mutually exclusive behaviors:Avoiding, Wandering, Visual Servoing, Grasping, and Ob-serving. Descriptions of the behaviors used in the divisionof labor implementation are given below.

- The Avoiding behavior causes the robot to turn toavoid obstacles in its path.- The Wandering behavior causes the robot to moveforward and after a random length of elapsed timeturn left or right through a random arc for a randomperiod of time.- The Visual Servoing behavior causes the robot tomove toward a detected puck of desired type. If therobot’s current foraging state is Robot the desiredpuck type is Puck , and if the robots currentforaging state is Robot the desired puck typeis Puck .- The Grasping behavior causes the robot to use thegripper to pickup and consume a puck within thegripper’s grasp.- The Observing behavior causes the robot to take an

Page 32: Silvia task

image from its camera and record the detected pucksand robots to their respective histories. The robot thenupdates its foraging state based on the puck and robothistories. A description of the histories is given inSection V-B and a description of the foraging stateupdate procedure is given in Section V-C.

Each behavior listed above has a set of activation con-ditions based on relevant sensor inputs and state values.When met, the conditions cause the behavior to be becomeactive. A description of when each activation conditionis active is given below. The activation conditions of allbehaviors are shown in Table I.

- The Obstacle Detected activation condition is truewhen an obstacle is detected by the sonar within adistance of 1 meter. Pucks are not detectable by thesonar, so are therefore not considered obstacles.- The Puck Detected activation condition is trueif the robot’s current foraging state is Robot anda puck of type Puck (where Det is Red or Green)is detected by the color camera within a distance ofapproximately 5 meters and within 30 degrees ofthe robot’s direction of travel.- The Gripper Break-Beam On activation conditionis true if the break-beam sensor between the gripperjaws detects an object;- The Observation Signal activation condition is trueif the distance traveled by the robot according toodometry since the last time the Observing behaviorwas activated is greater than 2 meters.

B. State Information

The robots maintain three types of state information:foraging state, observed puck history, and observed robothistory. The foraging state identifies the type of puck therobot is currently involved in foraging. A robot with aforaging state of Robot refers to a robot engaged in for-aging Puck pucks and a foraging state of Robotrefers to a robot engaged in foraging Puck pucks.Each robot is outfitted with a colored beacon ob-

servable by nearby robots which indicates the robot’scurrent foraging state. The color of the beacon changesto reflect the current state – a red beacon for a foragingstate of Robot and a green beacon for Robot .Thus, the colored beacon acts as a form of local, passivecommunication conveying the robot’s current foragingstate. All robots maintain a limited, constant-sized historystoring the most recently observed puck types and anotherconstant-sized history storing the foraging state of themost recently observed robots. Neither of these historiescontain a unique identity or location of detected pucks orrobots, nor does it store a time stamp of when any givenobservation was made. The history of observed pucks islimited to the last MAX-PUCK-HISTORY pucks observed

and the history of the foraging state of observed robotsis limited to the last MAX-ROBOT-HISTORY robots ob-served.While moving about the arena, each robot keeps track

of the approximate distance it has traveled by usingodometry measurements. At every interval of 2 meterstraveled, the robot makes an observation. An observationconsists of the robot taking the current image from itscolor camera and, using simple color blob detection,classifying all currently visible pucks and robots throughtheir respective colors and adding them to their respectivehistories. This procedure is nearly instantaneous; there-fore, the robot’s behavior is not outwardly affected. Thearea in which pucks and other robots are visible is within5 meters and 30 degrees in the robot’s direction oftravel. Observations are only made after traveling 2 metersbecause updating too frequently leads to over-convergenceof the estimated puck and robot type proportions due torepeated observations of the same pucks and/or robots. Onaverage, during our experiments, a robot detected 2 pucksand robots per observation.

C. Foraging State Transition Functions

After it makes an observation, the robot re-evaluatesits current foraging state given the newly updated puckand robot histories. We have experimented with severalfunctions, given below, to determine the conditions atwhich a robot should change its current foraging state.The first method is a simple step transition function.

The condition in which a robot with a current forag-ing state of Robot will change its foraging stateto Robot is given by Change(Green-Red), shown inEquation 1. Similarly, the condition in which a robotwith a current foraging state of Robot will changeits foraging state to Robot is given by Change(Red-Green), shown in Equation 2. In Equations 1 and 2, RR isthe proportion of Robot entries in the Robot Historyand RP is the proportion of Puck entries in the PuckHistory.

if RR RPotherwise (1)

if RR RPotherwise (2)

The second method we explored uses a probabilistictransition function. The probability that a robot with a cur-rent foraging state of Robot will change its foragingstate to Robot is given by the probability, P(Green-Red), shown in Equation 3. Similarly, the probabilitythat a robot with a current foraging state of Robotwill change its foraging state to Robot is given by

Page 33: Silvia task

Obstacle Puck Gripper Break- Observation ActiveDetected Detected Beam On Signal BehaviorX X X 1 Observing1 X X X Avoiding0 1 0 0 Visual Servoing0 X 1 0 Grasping0 X X X Wandering

TABLE IBEHAVIOR ACTIVATION CONDITIONS. BEHAVIORS ARE LISTED IN ORDER OF DECREASING RANK. HIGHER RANKING BEHAVIORS PREEMPT LOWER

RANKING BEHAVIORS IN THE EVENT MULTIPLE ARE ACTIVE. X DENOTES THE ACTIVATION CONDITION IS IRRELEVANT FOR THE BEHAVIOR.

the probability, P(Red-Green), shown in Equation 4. InEquations 3 and 4, RR is the proportion of Robotentries in the Robot History and RP is the proportion ofPuck entries in the Puck History.

(3)

(4)

VI. Experimental Results

We experimentally validated our LMMS division oflabor mechanism using both the step and probabilistictransition functions presented in Section V-C in the real-istic simulated environment described in Section IV. Allexperiments used 20 robots and 50 pucks and all presentedresults have been averaged over 20 experimental runs.To test the adaptability of the division of labor mech-

anism to external perturbations in puck type proportions,they were dynamically changed at various times duringthe experimental trials. The experiments begin with 30%Puck and 70% Puck pucks. At time 6000 sec-onds, the relative proportion of pucks are changed to 80%Puck and 20% Puck pucks, and at time 12000the relative proportions are changed to 50% Puck and50% Puck pucks. The total number of pucks remainsconstant throughout the experiment.The plots in Figures 2 and 3 show a comparison

between the performance of the step and the probabilis-tic transition functions using MAX-PUCK-HISTORY andMAX-ROBOT-HISTORY values of 2, 10, and 50. Boththe step and probabilistic transition functions converge toa stable division of labor. The use of the step transitionfunction leads to faster convergence to a stable division oflabor at the expense of more high frequency oscillation.When the puck proportions are equal, both transition

functions converge to the desired division of labor. Asthe puck type proportions become more skewed, with one

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

0.5

1Division of Labor Using Step Transition Function (Puck Hist:2 Robot Hist:2)

Red PucksRed Robots

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

0.5

1P

ropo

rtio

n P

uck

or R

obot

Typ

eDivision of Labor Using Step Transition Function (Puck Hist:10 Robot Hist:10)

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

0.5

1

Simulation Time (seconds)

Division of Labor Using Step Transition Function (Puck Hist:50 Robot Hist:50)

Fig. 2. The proportion of Puck pucks and robots foraging forPuck pucks over time when using the step transition function withdifferent puck and robot history lengths.

puck type becoming significantly more prevalent than theother, the systems using shorter history lengths tend toconverge slightly off of the desired division of labor. Thisis in part due to the insufficient granularity in their puckand robot histories.Another factor in evaluating the efficiency of these

methods is through the frequency by which individualrobots switch between tasks. In some task domains,switching between tasks can be very expensive and there-fore it should be avoided. Figures 4 and 5 show thecumulative number of times the robots change state duringthe course of the experiments. The data points are obtainedby summing the total number of forage state changes overthe course of the previous 50 seconds of the experiment (itis possible that a single robot could change foraging state

Page 34: Silvia task

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

0.5

1Division of Labor Using Probabilistic Transition Function (Puck Hist:2 Robot Hist:2)

Red PucksRed Robots

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

0.5

1

Pro

port

ion

Puc

k or

Rob

ot T

ype

Division of Labor Using Probabilistic Transition Function (Puck Hist:10 Robot Hist:10)

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

0.5

1

Simulation Time (seconds)

Division of Labor Using Probabilistic Transition Function (Puck Hist:50 Robot Hist:50)

Fig. 3. The proportion of Puck pucks and robots foraging forPuck pucks over time when using the probabilistic transition functionwith different puck and robot history lengths.

more than once during this interval). As the plots show,the shorter the puck and robot history lengths, the moreforaging state changes occur. Also, for puck and robothistories of the same length, the use of the probabilistictransition function leads to fewer foraging state changesthan the step transition function.In general, shorter puck and robot history lengths result

in faster convergence to the desired division of laborbut lead to higher frequency oscillations due to morefrequent changes in individual robot foraging state. Forincreasingly small histories, convergence to a division oflabor short of the desired is exhibited as the puck typeproportions become skewed. The probabilistic transitionfunction results in more stable convergence with fewerrobot state changes and fewer high frequency oscillationsin the division of labor. What transition function andhistory lengths are to be for an arbitrary task environmentdepends on factors such as how quickly the environmentchanges, how often the environment changes, and theexpense of changing tasks.

VII. Conclusions

We have presented a Large-Scale Minimalist Multi-Robot System (LMMS), composed of 20 simulated mobilerobots, in which the individual robots maintain a minimalamount of state information, extract a limited amount ofinformation from available sensors, and cannot actively

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

2

4

6

8Foraging State Changes Using Step Transition Function (Puck Hist:2 Robot Hist:2)

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

2

4

6

8

For

agin

g S

tate

Cha

nges

Foraging State Changes Using Step Transition Function (Puck Hist:10 Robot Hist:10)

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

2

4

6

8

Simulation Time (seconds)

Foraging State Changes Using Step Transition Function (Puck Hist:50 Robot Hist:50)

Fig. 4. The number of foraging state changes when using the steptransition function with different puck and robot history lengths.

or directly communicate with other robots in the system.Using this LMMS, we have demonstrated a method bywhich to achieve a desired division of labor in a concurrentforaging task domain, experimentally validated it in arealistic simulation, and demonstrated its robustness andadaptability to environmental perturbations.

VIII. Acknowledgments

This work is supported in part by DARPA TASK GrantF30602-00-2-0573 and in part by NSF ITR Grant EIA-0121141.

IX. REFERENCESAgassounon, W. & Martinoli, A. (2002), Effi ciency and robust-

ness of threshold-based distributed allocation algorithmsin multi-agent systems, in ‘First Int. Joint Conf. on Au-tonomous Agents and Multi-Agent Systems’, ACM Press,Bologna, Italy, pp. 1090–1097.

Beckers, R., Holland, O. & Deneubourg, J. (1994), From localactions to global tasks: Stigmergy and collective robotics,in R. Brooks & P. Maes, eds, ‘Artifi cal Life IV: Proceedingsof the Fourth International Workshop on the Synthesis andSimulation of Living Systems’, MIT Press, Cambridge,MA, pp. 181–189.

Bonabeau, E., Th´eraulaz, G. & Deneubourg, J. (1996), ‘Quanti-tative study of the fi xed threshold model for the regulationof division of labour in insect societies’, Proceedings RoyalSociety of London B 263, 1565–1569.

Fredslund, J. & Matari´c, M. J. (2002), ‘A general, local algo-rithm for robot formations’, IEEE Transactions on Roboticsand Automation, Special Issue on Multi-Robot Systems18(5), 837–846.

Page 35: Silvia task

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

1

2

3

4

5Foraging State Changes Using Probabilistic Transition Function (Puck Hist:2 Robot Hist:2)

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

1

2

3

4

5

For

agin

g S

tate

Cha

nges

Foraging State Changes Using Probabilistic Transition Function (Puck Hist:10 Robot Hist:10)

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

1

2

3

4

5

Simulation Time (seconds)

Foraging State Changes Using Probabilistic Transition Function (Puck Hist:50 Robot Hist:50)

Fig. 5. The number of foraging state changes when using theprobabilistic transition function with different puck and robot historylengths.

Gerkey, B. P. & Matari´c, M. J. (2002), ‘Sold!: Auction meth-ods for multi-robot coordination’, IEEE Transactions onRobotics and Automation, Special Issue on Multi-RobotSystems 18(5), 758–768.

Gerkey, B., Vaughan, R., Støy, K., Howard, A., Sukhatme, G. &Matari´c, M. (2001), Most valuable player: A robot deviceserver for distributed control, in ‘IEEE/RSJ InternationalConference on Intelligent Robots and Systems, (IROS-01)’,Maui, Hawaii, pp. 1226–1231.

Goldberg, D. & Matari´c, M. J. (2002), Design and evaluationof robust behavior-based controllers for distributed multi-robot collection tasks, in T. Balch & L. E. Parker, eds,‘Robot Teams: From Diversity to Polymorphism’, AKPeters, pp. 315–344.

Holland, O. & Melhuish, C. (2000), ‘Stigmergy, self-organization, and sorting in collective robotics’, Artifi cialLife 5(2), 173–202.

Krieger, M. B. & Billeter, J.-B. (2000), ‘The call for duty: Self-organized task allocation in a population of up to twelvemobile robots’, Robotics and Autonomous Systems 30(1-2), 65–84.

Kube, C. & Zhang, H. (1996), The use of perceptual cuesin multi-robot box-pushing, in ‘IEEE International Con-ference on Robotics and Automation’, Minneapolis, Min-nesota, pp. 2085–2090.

Martinoli, A., Ijspeert, A. J. & Mondada, F. (1999), ‘Understand-ing collective aggregation mechanisms: From probabiliticmodeling to experiments with real robots’, Robotics andAutonomous Systems 29, 51–63.

Matari´c, M. (1995), ‘Designing and understanding adaptivegroup behavior’, Adaptive Behavior 4:1, 51–80.

Parker, E. L. (1998), ‘Alliance: An architecture for fault tolerantmulti-robot cooperation’, IEEE Transactions on Roboticsand Automation 14(2), 220–240.

Th´eraulaz, G., E., B. & Deneubourg, J. (1998), ‘Thresholdreinforcement and the regulation of division of labour ininsect societies’, Proceedings Royal Society of London B265, 327–335.

Th´eraulaz, G., Goss, S., Gervet, J. & Deneubourg, J. (1991),Task differentiation in polistes wasp colonies: A model forself-organizing groups of robots, in J. Meyer & S. Wilson,eds, ‘International Conference on Simulation of Adaptive

Behavior: From Animals to Animats’, MIT Press, Paris,France, pp. 346–355.

Vaughan, R. (2000), ‘Stage: A multiple robot simulator’, Institutefor Robotics and Intelligent Systems Technical Report IRIS-00-394, Univ. of Southern California .

Werger, B. B. (1999), ‘Cooperation without deliberation: Aminimal behavior-based approach to multi-robot teams’,Artifi cial Intelligence 110, 293–320.

Werger, B. & Matari´c, M. (1996), Robotic ”food” chains: Exter-nalization of state and program for minimal-agent foraging,in P. Maes, M. Matari´c, J. Meyer, J. Pollack & S. Wilson,eds, ‘From Animals to Animats 4, Fourth InternationalConference on the Simulation of Adaptive Behavior (SAB-96)’, Cape Cod, Massachusetts, pp. 625–634.

Page 36: Silvia task

To appear in the Intl. J. of Robotics ResearchAlso Technical Report CRES-03-013, Center for Robotics and Embedded Systems, USC, July 2003

A formal analysis and taxonomy of task allocationin multi-robot systems

Brian P. Gerkey Maja J MataricArtificial Intelligence Lab Computer Science DepartmentStanford University University of Southern California

Stanford, CA 94305-9010, USA Los Angeles, CA 90089-0781, [email protected] [email protected]

AbstractDespite more than a decade of experimental work inmulti-robot systems, important theoretical aspects ofmulti-robot coordination mechanisms have, to date, beenlargely untreated. To address this issue, we focus onthe problem of multi-robot task allocation (MRTA). Mostwork on MRTA has been ad hoc and empirical, with manycoordination architectures having been proposed and val-idated in a proof-of-concept fashion, but infrequently an-alyzed. With the goal of bringing objective groundingto this important area of research, we present a formalstudy of MRTA problems. A domain-independent taxon-omy of MRTA problems is given, and it is shown howmany such problems can be viewed as instances of other,well-studied, optimization problems. We demonstrate howrelevant theory from operations research and combinato-rial optimization can be used for analysis and greater un-derstanding of existing approaches to task allocation, andshow how the same theory can be used in the synthesis ofnew approaches.

1 IntroductionOver the past decade, a significant shift of focus has oc-curred in the field of mobile robotics as researchers havebegun to investigate problems involving multiple, ratherthan single, robots. From early work on loosely-coupledtasks such as homogeneous foraging (Mataric 1992) tomore recent work on team coordination for robot soccer(Stone & Veloso 1999), the complexity of the multi-robotsystems being studied has increased. This complexity hastwo primary sources: larger team sizes and greater hetero-geneity of robots and tasks. As significant achievementshave been made along these axes, it is no longer sufficientto show, for example, a pair of robots observing targetsor a large group of robots flocking as examples of coordi-nated robot behavior. Today we reasonably expect to see

increasingly larger robot teams engaged in concurrent anddiverse tasks over extended periods of time.

1.1 Multi-Robot Task Allocation (MRTA)As a result of the growing focus on multi-robot systems,multi-robot coordination has received significant atten-tion. In particular, multi-robot task allocation (MRTA)has recently risen to prominence and become a key re-search topic in its own right. As researchers design, build,and use cooperative multi-robot systems, they invariablyencounter the fundamental question: “which robot shouldexecute which task?” in order to cooperatively achievethe global goal. By “task,” we mean a subgoal that is nec-essary for achieving the overall goal of the system, andthat can be achieved independently of other subgoals (i.e.,tasks). Tasks can be discrete (e.g., deliver this package toroom 101) or continuous (e.g., monitor the building en-trance for intruders) and can also vary in a number ofother ways, including timescale, complexity, and speci-ficity. We do not categorize or distinguish between differ-ent kinds of robotic tasks, though others have done so (seeSection 2). Task independence is a strong assumption, andone that clearly limits the scope of our study. For exam-ple, we do not allow ordering constraints on a set of tasks;in general we require that individual tasks can be consid-ered and assigned independently of each other. This issueis addressed further in Section 7.2.In this work, we are concerned with methods for inten-

tional cooperation (Parker 1998). In this model, robotscooperate explicitly and with purpose, often through task-related communication and negotiation. Intentional coop-eration is clearly not a prerequisite for a multi-robot sys-tem to exhibit coordinated behavior, as demonstrated byminimalist or emergent approaches (Deneubourg, Ther-aulaz & Beckers 1991). In such systems, individuals co-ordinate their actions through their interactions with eachother and with the environment, but without explicit nego-

1

Page 37: Silvia task

tiation or allocation of tasks. An open question is whichtasks (if any) require intentional cooperation. For exam-ple, cooperative box-pushing has been demonstrated us-ing both emergent (Kube & Zhang 1993) and intentional(Parker 1998) techniques, and there remains significantdebate as to the relative value of the two approaches.However, emergent systems tend not to be amenable to

analysis, with their exact behavior difficult, if not impos-sible, to predict. We assert that, as compared with emer-gent cooperation, intentional cooperation is usually bettersuited to the kinds of real-world tasks that humans mightwant robots to do. If the robots are deliberately coop-erating with each other, then, intuitively we expect thathumans can deliberately cooperate with them, which is along-term goal of multi-robot research. Furthermore, in-tentional cooperation has the potential to better exploit thecapabilities of heterogeneous robot teams. In this work,the use of intentional cooperation is at the level of taskallocation, and need not propagate to the level of task ex-ecution. Importantly, we do not prescribe or proscribe anyparticular method for implementing the details of a task.For example, if a foraging task is assigned to a team ofrobots because they are best fit for the job, they can ex-ecute the task in any way they wish, from probabilisticswarming to classical planning.

1.2 Toward formal analysis

The question of task allocation must be answered, evenfor relatively simple multi-robot systems, and its impor-tance grows with the complexity, in size and capability, ofthe system under study. The empirically validated meth-ods demonstrated to date remain primarily ad hoc in na-ture, and relatively little has been written about the gen-eral properties of cooperative multi-robot systems. Aftera decade of research, while cooperative architectures havebeen proposed, the field still lacks a prescription for howto design a MRTA system. Similarly, there has been littleattempt to evaluate or compare the proposed architectures,either analytically or empirically.In this paper we present a particular taxonomy for

studying MRTA, based on organizational theory fromseveral fields, including operations research, economics,scheduling, network flows, and combinatorial optimiza-tion. We show how this taxonomy can be used to an-alyze and classify MRTA problems, and evaluate andcompare proposed solutions. For the simpler (and mostwidely studied) problems, we provide a complete anal-ysis and prescribe provably optimal, yet tractable, algo-rithms for their solution. For more difficult problems, wesuggest candidate approximation algorithms that have en-joyed success in other application domains. There are alsosome extremely difficult MRTA problems for which there

do not currently exist good approximations; in such caseswe provide formal characterizations of the problems butdo not suggest how they should be solved.Our approach is not meant to be final or exhaustive and

indeed it has limitations. However, we believe that theideas we present constitute a starting point on a path to-ward a more complete understanding of problems involv-ing MRTA, as well as other aspects of multi-robot coordi-nation.

2 Related workResearch in multi-robot systems has focused primarily onconstruction and validation of working systems, ratherthan more general analysis of problems and solutions.As a result, in the literature, one can find many archi-tectures for multi-robot coordination, but relatively fewformal models of multi-robot coordination. We do notattempt here to cover the various proposed and demon-strated architectures. For a thorough treatment of imple-mented multi-robot systems, consult Cao, Fukunaga &Kahng (1997) or the more recent work of Dudek, Jenkin& Milios (2002). Each provides a taxonomy that catego-rizes the bulk of existing multi-robot systems along var-ious axes, including team organization (e.g., centralizedvs. distributed), communication topology (e.g., broadcastvs. unicast), and team composition (e.g., homogeneousvs. heterogeneous). Rather than characterize architec-tures, we seek instead to categorize the underlying prob-lems, although we do analyze and discuss several key ar-chitectures that solve those problems; see Section 6.Formal models of coordination in multi-robot systems

tend to target medium- to large-scale systems composedof simple, homogeneous robots, such as the CEBOTS(Fukuda, Nakagawa, Kawauchi & Buss 1988). Agas-sounon & Martinoli (2002) explored the tradeoffs be-tween using a coarse, macroscopic model of such sys-tems and using detailed, microscopic models of the indi-viduals. Lerman & Galstyan (2002) presented a physics-inspired macroscopic model of a cooperative multi-robotsystem and showed that it accurately described the behav-ior of physical robots engaged in stick-pulling and forag-ing tasks. That kind of model is descriptive but not pre-scriptive, in that it does not guide the design of control orcoordination mechanisms.Though simple and elegant, such models are insuffi-

cient for domains involving complex tasks or requiringprecise control. To study complex tasks, Donald, Jennings& Rus (1997) proposed the formalism of information in-variants, which models the information requirements of acoordination algorithm and provides a mechanism to per-form reductions between algorithms. Spletzer & Taylor(2001) developed a prescriptive control-theoretic model

2

Page 38: Silvia task

of multi-robot coordination and showed that it can beused to produce precise multi-robot box-pushing. Ma-son (1986) had earlier applied a similar control-theoreticmodel to box-pushing with dexterous manipulators.Relatively little work has been done on formal mod-

eling, analysis, or comparison of multi-robot coordina-tion at the level of task allocation. Chien, Barrett, Es-tlin & Rabideau (2000) developed a baseline geologicalscenario and used it to compare three different planningapproaches to coordinating teams of planetary rovers.Klavins (2003) showed how to apply the theory of com-munication complexity to the study of multi-robot coor-dination algorithms. Finally, Jennings & Kirkwood-Watts(1998) described the method of dynamic teams, concen-trating on programmatic structures that enable the specifi-cation of multi-robot tasks.Multi-robot systems can also be formally described by

process models, such as Petri nets (Murata 1989) and Par-tially Observable Markov Decision Processes (Kaelbling,Littman & Cassandra 1998), both of which are highly ex-pressive. Unfortunately, such models tend to be too com-plex to be directly analyzed or solved, even for modest-sized systems. Another formal model is that of the hybridsystem (Alur, Courcoubetis, Halbwachs, Henzinger, Ho,Nicollin, Olivero, Sifakis & Yovine 1995), which charac-terizes discrete systems operating in an analog environ-ment. Hybrid systems can also become complex and areusually used to describe or control the behavior of a singlerobot, via a so-called three-layer architecture (Gat 1998).Our goal in this paper is to fill a gap in the existing liter-

ature on multi-robot coordination. We neither construct aformal model in support of a particular coordination archi-tecture, nor compare different architectures in a particulartask domain. Rather, we develop a task- and architecture-independent taxonomy, based on optimization theory, inwhich to study task allocation problems.

3 UtilityTo treat task allocation in an optimization context, onemust decide what exactly is to be optimized. Ideally thegoal is to directly optimize overall system performance,but that quantity is often difficult to measure during sys-tem execution. Furthermore, when selecting among alter-native task allocations, the impact on system performanceof each option is usually not known. Consequently, somekind of performance estimate, such as utility, is needed.Utility is a unifying, if sometimes implicit, concept in

economics, game theory, and operations research, as wellas in multi-robot coordination. It is based on the no-tion that each individual can internally estimate the value(or the cost) of executing an action. Depending on thecontext, utility is also called fitness, valuation, and cost.

Within multi-robot research, the formulation of utility canvary from sophisticated planner-based methods (Botelho& Alami 1999) to simple sensor-based metrics (Gerkey& Mataric 2002b). We posit that utility estimation ofthis kind is carried out somewhere in every autonomoustask allocation system, for the heart of any task alloca-tion problem is comparison and selection among a set ofavailable alternatives. Since each system uses a differentmethod to calculate utility, we give the following genericand practical definition of utility for multi-robot systems.It is assumed that each robot is capable of estimating

its fitness for every task it can perform. This estima-tion includes two factors, which are both task- and robot-dependent:

• expected quality of task execution, given the methodand equipment to be used (e.g., the accuracy of themap that will be produced using a laser range-finder),

• expected resource cost, given the spatio-temporal re-quirements of the task (e.g., the power that will berequired to drive the motors and laser range-finder inorder to map the building).

Given a robot R and a task T , if R is capable of executingT , then one can define, on some standardized scale, QRT

and CRT as the quality and cost, respectively, expectedto result from the execution of T by R. This results in acombined, nonnegative utility measure:

URT =

!"

#

QRT ! CRT if R is capable of executingT and QRT > CRT

0 otherwise

For example, given a robot A that can achieve a task Twith quality QAT = 20 at cost CAT = 10 and a robotB that can achieve the same task with quality QBT = 15at cost CBT = 5, there should be no preference betweenthem when searching for efficient assignments, for:

UAT = 20 ! 10 = 10 = 15 ! 5 = UBT .

Regardless of the method used for calculation, therobots’ utility estimates will be inexact due to sensornoise, general uncertainty, and environmental change.These unavoidable characteristics of the multi-robot do-main will necessarily limit the efficiency with which co-ordination can be achieved. We treat this limit as exoge-nous, on the assumption that lower-level robot control hasalready been made as reliable, robust, and precise as pos-sible and thus that we are incapable of improving it at thetask allocation level. When we discuss “optimal” allo-cations, we mean “optimal” in the sense that, given theunion of all information available in the system (with theconcomitant noise, uncertainty, and inaccuracy), it is im-possible to construct a solution with higher overall utility.

3

Page 39: Silvia task

This notion of optimality is analogous to that used in op-timal scheduling (Dertouzos & Mok 1983).It is important to note that utility is an extremely flexi-

ble measure of fitness that can encompass arbitrary com-putation. The only constraint on utility estimators is thatthey must each produce a single scalar value that can becompared for the purpose of ordering candidates for tasks.For example, if the metric for a particular task is distanceto a location and a candidate robot employs a probabilis-tic localization mechanism, then a reasonable utility esti-mate might be to calculate the distance to the target usingthe center of mass of the current probability distribution.Other mechanisms, such as planning and learning, canlikewise be incorporated into utility estimation. Regard-less of the domain, it is vital that all relevant aspects ofthe state of the robots and their environment be includedin the utility calculation. Signals that are left out of thiscalculation but are taken into consideration when evaluat-ing overall system performance are what economists referto as externalities (Simon 2001) and their effects can bedetrimental, if not catastrophic.

4 Combinatorial optimizationBefore entering into a discussion of task allocation prob-lems as being primarily concerned with optimization, itwill be necessary to provide some theoretical background.The field of combinatorial optimization provides a set-theoretic framework, based on subset systems, for describ-ing a wide variety of optimization problems (Ahuja, Mag-nanti & Orlin 1993):

Definition (Subset System). A subset system (E,F ) is afinite set of objectsE and a nonempty collection F of sub-sets, called independent sets, of E that satisfies the prop-erty that if X " F and Y # X then Y " F .

That is, any subset of an independent set is also an in-dependent set. A general maximization problem can bedefined in the following way:

Definition (Subset Maximization). Given a subset sys-tem (E,F ) and a utility function u : E $ R+, find anX " F that maximizes the total utility:

u(X) =$

e!X

u(e) (1)

The elements of F are usually not given directly, or atleast are inconvenient to represent explicitly. Instead, it isassumed that an oracle is available that, given a candidatesetX , can decide whetherX " F . The job of such an or-acle, given a proposed solution, is to verify the feasibilityof that solution. For many problems, this verification is

computationally trivial when compared to the complexityof the optimization problem.Given a maximization problem over a subset system,

one can define algorithms that attempt to solve it. Of par-ticular interest is the canonical Greedy algorithm (Ahujaet al. 1993):

Algorithm (The Greedy algorithm).

1. Reorder the elements of E = {e1, e2, . . . , en} suchthat u(e1) % u(e2) % . . . % u(en).

2. Set X := &.

3. For j = 1 to n:if X ' {ej} " F thenX = X ' {ej}

This algorithm is an abstraction of the familiar and intu-itive greedy algorithm for solving a problem: repeatedlytake the best valid option. While the Greedy algorithmperforms well on some optimization problems, it can doquite poorly on others. In particular, it performs well oncertain subset systems that can be further classified asma-troids:

Definition (Matroid). A subset system (E,F ) is a ma-troid if, for each X,Y " F with |X| > |Y |, there existsan x " X \ Y such that Y ' {x} " F .

That is, given two independent sets X and Y , with Xlarger than Y , Y can be “grown” by adding to it someelement from X . With respect to the current discussion,an equivalent definition of a matroid is that a subset sys-tem (E,F ) is a matroid if and only if the Greedy algo-rithm optimally solves the associated maximization prob-lem (Korte & Vygen 2000). In the parlance of algorith-mic analysis, matroids satisfy the greedy-choice property,which is a prerequisite for a greedy algorithm to producean optimal solution (Cormen, Leiserson & Rivest 1997).Matroids are of particular interest precisely because theirassociated optimization problems are amenable to greedysolution.While the Greedy algorithm does not optimally solve

every maximization problem, it is useful to know howpoor the greedy solution can be. For such purposes it iscommon to report a competitive factor for the sub-optimalalgorithm. For a maximization problem, an algorithm iscalled !-competitive if, for any input, it finds a solutionwhose total utility is never less than 1

! of the optimal util-ity.

5 A taxonomy of MRTA problemsWe propose a taxonomy ofMRTA problems based on axeslaid out below. Our goals here are two-fold: 1) to show

4

Page 40: Silvia task

how various MRTA problems can be positioned in the re-sulting problem space; and 2) to explain how organiza-tional theory relates to those problems and to proposedsolutions from the robotics literature. In some cases, itwill be possible to construct provably optimal solutions,while in others only approximate solutions are available.There are also some difficult MRTA problems for whichthere do not currently exist good approximations. Whendesigning a multi-robot system, it is essential to under-stand what kind of task allocation problem is present inorder to solve it in a principled manner.We propose the following three axes for use in describ-

ing MRTA problems:

• single-task robots (ST) vs. multi-task robots(MT): ST means that each robot is capable of exe-cuting as most one task at a time, while MT meansthat some robots can execute multiple tasks simulta-neously.

• single-robot tasks (SR) vs. multi-robot tasks(MR): SR means that each task requires exactly onerobot to achieve it, while MR means that some taskscan require multiple robots.

• instantaneous assignment (IA) vs. time-extendedassignment (TA): IA means that the available infor-mation concerning the robots, the tasks, and the en-vironment permits only an instantaneous allocationof tasks to robots, with no planning for future alloca-tions. TA means that more information is available,such as the set of all tasks that will need to be as-signed, or a model of how tasks are expected to arriveover time.

We denote a particular MRTA problem by a triple of two-letter abbreviations drawn from this list. For example,a problem in which multi-robot tasks must be allocatedonce to single-task robots is designated ST-MR-IA.These axes are not meant to be exhaustive, but to al-

low for a taxonomy that is both broad enough and de-tailed enough to meaningfully characterize many practi-cal MRTA problems. Furthermore, this taxonomy will of-ten allow for a prescription of solutions. The followingsections present the combinations allowed by these axes,discussing for each which MRTA problem(s) it representsand what organizational theory pertains. Section 7 treatssome important MRTA problems that are not captured bythis taxonomy.

5.1 ST-SR-IA: Single-task robots, single-robot tasks, instantaneous assignment

This problem is the simplest, as it is actually an instanceof the Optimal Assignment Problem (OAP) (Gale 1960),

which is a well-known problem that was originally stud-ied in game theory and then in operations research, in thecontext of personnel assignment. A recurring special caseof particular interest in several fields of study, this prob-lem can be formulated in many ways. Given the applica-tion domain of MRTA, it is fitting to describe the problemin terms of jobs and workers.

Definition (Optimal Assignment Problem). Given mworkers, each looking for one job and n prioritized jobs,each requiring one worker. Also given for each worker is anonnegative skill rating (i.e., utility estimate) that predictshis/her performance for each job; if a worker is incapableof undertaking a job, then the worker is assigned a ratingof zero for that job. The goal is to assign workers to jobsso as to maximize overall expected performance, takinginto account the priorities of the jobs and the skill ratingsof the workers.

The OAP can be cast in many ways, including as anintegral linear program (Gale 1960): findmn nonnegativeintegers !ij that maximize

U =m$

i=1

n$

j=1

!ijUijwj (2)

subject to

m$

i=1

!ij = 1, 1 ( j ( n

n$

j=1

!ij = 1, 1 ( i ( m.

(3)

The sum (2) is the overall system utility, while (3) en-forces the constraint of working with single-worker jobsand single-job workers (note that since !ij are integersthey must all be either 0 or 1). Given an optimal solu-tion to this problem (i.e., a set of integers !ij that max-imizes (2) subject to (3)), an optimal assignment is con-structed by assigning worker i to job j only when !ij = 1.The ST-SR-IA problem can be posed as an OAP in the

following way: given m robots, n prioritized tasks, andutility estimates for each of the mn possible robot-taskpairs, assign at most one task to each robot. If the robots’utilities can be collected at one machine (or distributed toall machines), then a centralized linear programming ap-proach (e.g., Kuhn’s (1955) Hungarian method) will findthe optimal allocation in O(mn2) time.Alternatively, a distributed auction-based approach

(e.g., Bertsekas’s (1990) Auction algorithm) will find theoptimal allocation, usually requiring time proportional tothe maximum utility and inversely proportional to theminimum bidding increment. In order to understand such

5

Page 41: Silvia task

economically-inspired algorithms, it is necessary to con-sider the concept of linear programming duality. As do allmaximum linear programs, the OAP has a dual minimumlinear program, which can be stated as follows: find mintegers ui and n integers vj that minimize:

P =m$

i=1

ui +n$

j=1

vj (4)

subject to:ui + vj % Uij , )i, j. (5)

The Duality Theorem states that the original problem(called the primal) and its dual are equivalent, and thatthe total utility of their respective optimal solutions arethe same (Gale 1960).Optimal auction algorithms for task allocation usually

work in the following way. Construct a price-based taskmarket, in which tasks are sold by brokers to robots. Eachtask j is for sale by a broker, which places a value cj onthe task. Also, robot i places a value hij on task j. Theproblem then is to establish task prices pj , which will inturn determine the allocation of tasks to robots. To befeasible, the price pj for task j must be greater than orequal to the broker’s valuation cj ; otherwise, the brokerwould refuse to sell. Assuming that the robots are actingselfishly, each robot i will elect to buy a task ti for whichits profit is maximized:

ti = argmaxj

{hij ! pj}. (6)

Such a market is said to be at equilibrium when prices aresuch that no two robots select the same task.At equilibrium, each individual’s profit in this market is

maximized. Furthermore, the profits made by the robotsand the profits made by the brokers form an optimal solu-tion to the dual of the OAP:

ui = hiti ! pti , )ivj = pj ! cj , )j.

(7)

Thus, the allocation produced by the market at equilib-rium is optimal (Gale 1960).In MRTA problems, separate valuations are not given in

this manner, but only combined utility estimates for robot-task pairs. However, task valuations can be defined for therobots and brokers as follows:

hij = !ij

cj = 0. (8)

The solution to the corresponding dual problem then be-comes:

ui = !iti ! pti

vj = pj .(9)

Note that setting cj to 0 implicitly states that the bro-kers always prefer to sell their tasks, regardless of howmuch they are paid. In other words, it is always betterto execute a task than not execute it, regardless of theexpected performance. In economic terminology, thoseare lexicographic preferences with regard to the tasks(Pearce 1999). Such preferences violate important as-sumptions concerning the nature of utility values thatare made when building or analyzing general economicsystems. Fortunately, in constructing the market corre-sponding to the ST-SR-IA problem, no assumptions aremade concerning the robots’ preferences, and so lexico-graphic preferences do not present a problem. On theother hand, the behavior of more complex, long-livedeconomies (such as the markets suggested by Dias &Stentz (2001) and Gerkey & Mataric (2002a)) may de-pend strongly on the nature of the robots’ preferences, es-pecially if the synthetic economies are meant to interactwith the human economy.The two approaches (i.e., centralized and distributed)

to solving the OAP represent a tradeoff between solu-tion time and communication overhead. Centralized ap-proaches generally run faster than distributed approaches,but incur a higher communication overhead. To imple-ment a centralized assignment algorithm, n2 messagesare required to transmit the utility of each robot for eachtask; an auction-based solution usually requires far fewer(sometimes fewer than n) messages to reach equilibrium.With the addition of simple optimizations, such as buffer-ing multiple utility values and transmitting them in onemessage, this gap in communication overhead will onlybecome apparent in large-scale systems. Furthermore,the time required to transmit a message cannot be ig-nored, especially in wireless networks, which can in-duce significant latency. Thus, for small- to medium-scale systems, say n < 200, a broadcast-based cen-tralized assignment solution is likely the better choice.Not surprisingly, many MRTA architectures implementsome form of this approach (Parker 1998, Werger &Mataric 2001, Castelpietra, Iocchi, Nardi, Piaggio, Scalzo& Sgorbissa 2001, Weigel, Auerback, Dietl, Dumler,Gutmann, Marko, Muller, Nebel, Szerbakowski & Thiel2001, Østergard, Mataric & Sukhatme 2001).In order to determine its viability for solving MRTA

problems, we implemented, in ANSI C, the Hungarianmethod1 (Kuhn 1955). We tested on randomly generatedsymmetric assignment problems (i.e., problems wherem = n) with uniformly distributed utilities, and foundthat the Hungarian method is easily fast enough to be usedin the control loop in real-world MRTA domains. Usinga Pentium III-700MHz, problems with tens of robots and

1The code for our implementation is available from:http://robotics.stanford.edu/!gerkey.

6

Page 42: Silvia task

tasks can be solved in less than 1ms and problems with300 robots and tasks can be solved in less than 1s.

5.1.1 Variant: iterated assignment

Few MRTA problems exhibit exactly the above one-time assignment structure. However, many problems canbe framed as iterated instances of ST-SR-IA. Considerthe cooperative multi-object tracking problem known asCMOMMT, studied by Parker (1999) and Werger &Mataric (2001), which consists of coordinating robots toobserve multiple unpredictably moving targets. Whenpresented with new sensor inputs (e.g., camera images)and consequent utility estimates (e.g., perceived distanceto each target), the systemmust decide which robot shouldtrack which target.Werger & Mataric’s (2001) MRTA architecture, Broad-

cast of Local Eligibility (BLE), solves this iterated assign-ment problem using the following algorithm:

Algorithm (BLE assignment algorithm).

1. If any robot remains unassigned, find the robot-taskpair (i, j) with the highest utility. Otherwise, quit.

2. Assign robot i to task j and remove them from con-sideration.

3. Go to step 1.

This algorithm is an instance of the canonical Greedyalgorithm. The OAP is not a matroid (see Section 4) andso the Greedy algorithm will not necessarily produce anoptimal solution. The Greedy algorithm is known to be 2-competitive for the OAP (Avis 1983), and thus so is BLE.That is, in the worst case, BLE will produce a solutionwhose benefit is 1

2 of the optimal benefit. Exactly this al-gorithm, operating on a global blackboard, has been usedin a study of the impact of communication and coordi-nation on MRTA (Østergard et al. 2001). A very similarassignment algorithm is also used by Botelho & Alami’s(1999) M+ architecture.Parker’s (1998) MRTA architecture L-ALLIANCE,

which can also perform iterated allocation, learns its as-signment algorithm from experience. The resulting algo-rithm is similar to, but potentially more sophisticated than,the Greedy algorithm. If well-trained, the L-ALLIANCEassignment algorithm can outperform the Greedy algo-rithm (Parker 1994), but is not guaranteed to be optimal.Another domain in which the iterated OAP arises is

robot soccer. Since many of the robots are interchange-able, it is often advantageous to allow any player to takeon any role within the team, according to the current sit-uation in the game. The resulting coordination problemcan be cast as an iterated assignment problem in whichthe robots’ roles are periodically reevaluated, usually at

a frequency on the order of 10Hz. This utility-based dy-namic role assignment problem and has been studied bymany (Stone & Veloso 1999, Weigel et al. 2001, Castelpi-etra et al. 2001, Emery, Sikorski & Balch 2002, Vail &Veloso 2003).

It is common in the robot soccer domain for each robotto calculate its utility for each role and periodically broad-cast these values to its teammates. The robots can then ex-ecute, in parallel, some centralized assignment algorithm.For example, Castelpietra et al.’s (2001) assignment algo-rithm consists of ordering the roles in descending priorityand then assigning each to the available robot with thehighest utility. This algorithm is yet another instance ofthe Greedy algorithm. Vail & Veloso (2003) also employthe Greedy algorithm with fixed priority roles. Weigelet al. (2001) employ a similar but slightly more sophis-ticated algorithm that tries to address the problem of ex-cessive role-swapping by imposing stricter prerequisitesfor reassignment. Among other things, the algorithm re-quires that both robots “want” to exchange roles in orderto maximize their respective utilities, recalling the condi-tions for equilibrium in markets (see Section 5.1). How-ever, Weigel et al.’s (2001) algorithm is not guaranteed toproduce optimal assignments of roles, a fact that can eas-ily be shown by counterexample.

Since the number of robots involved in many iteratedMRTA problems today is small (n ( 11 for robot soc-cer, which is more than for most current multi-robot sys-tems), O(n3) optimal assignment algorithms could easilyreplace the suboptimal ad hoc assignment algorithms thatare typically used. As the performance results mentionedin the previous section show, the Hungarian method canbe used to solve typical problems in less than 1ms per it-eration with the moderately powerful computers found ontoday’s robots.

Since there is some additional cost for running an opti-mal algorithm (if only in the work involved in the imple-mentation), one might ask whether the optimal solutionprovides a sufficient benefit. For example, it is known thatfor arbitrary assignment problems, the Greedy algorithm’sworst-case behavior is to produce a solution with half ofthe optimal utility. However, it is not known how thealgorithm can be expected to perform on typical MRTAproblems, which exhibit some structure and are unlikelyto present pathological utility combinations. Anecdotalevidence suggests that the Greedy algorithm works ex-tremely well on such problems. An interesting avenueof research would be to analytically determine how wellthe Greedy algorithm will perform on the kinds of utilitylandscapes that are encountered in MRTA problems.

7

Page 43: Silvia task

5.1.2 Variant: online assignment

In some MRTA problems, the set of tasks is not revealedat once, but rather the tasks are introduced one at a time.If robots that have already been assigned cannot be reas-signed, then this problem is a variant of SR-ST-IA, knownas online assignment (Kalyanasundaram & Pruhs 1993).Instead of being initially given, the robot-task utility ma-trix is revealed one column (or row) at a time. If previ-ously assigned robots can be reassigned, then the problemreduces to an instance of the iterated SR-ST-IA problem,which can be optimally solved with standard assignmentalgorithms.The MRTA problems solved by Gerkey & Mataric’s

(2002b) MURDOCH system, in which tasks are randomlyinjected into the system over time, are instances of theonline assignment problem. The MURDOCH assignmentalgorithm can be stated as follows:

Algorithm (MURDOCH assignment algorithm).

1. When a new task is introduced, assign it to the mostfit robot that is currently available.

This simple algorithm is yet another instance of theGreedy algorithm, and is known in the context of networkflows as the Farthest Neighbor algorithm. Not surpris-ingly, the online assignment problem is not a matroid; theGreedy algorithm is known to be 3-competitive with re-spect to the optimal post hoc offline solution. Further-more, this performance bound is the best possible forany online assignment algorithm (Kalyanasundaram &Pruhs 1993). Thus, without a model of the tasks that areto be introduced, and without the option of reassigningrobots that have already been assigned, it is impossible toconstruct a better task allocator than MURDOCH.

5.2 ST-SR-TA: Single-task robots, single-robot tasks, time-extended assignment

When the system consists of more tasks than robots, or ifthere is a model of how tasks will arrive, then the robots’future utilities for the tasks can be predicted with someaccuracy, and the problem is an instance of ST-SR-TA.This problem is one of building a time-extended scheduleof tasks for each robot, with the goal of minimizing totalweighted cost. Using Brucker’s (1998) terminology, thisproblem is an instance of the class of scheduling problems

R ||$

wjCj .

That is, the robots execute tasks in parallel (R) and theoptimization criterion is the weighted sum of executioncosts (

%wjCj). Problems in this class are strongly NP-

hard (Bruno, Coffman & Sethi 1974). Even for relatively

small problems, the exponential space of possible sched-ules precludes enumerative solutions.A means of treating ST-SR-TA is to ignore the time-

extended component and approximate the problem as aninstance of the ST-SR-IA problem (Section 5.1), followedby an instance of the online assignment problem (Sec-tion 5.1.2). For example, given m robots and n tasks,with n > m, the following approximation algorithm canbe used:

Algorithm (ST-SR-TA approximation algorithm).

1. Optimally solve the initial m * n assignment prob-lem.

2. Use the Greedy algorithm to assign the remainingtasks in an online fashion, as the robots becomeavailable.

The performance of this algorithm is bounded belowby the normal Greedy algorithm, which is 3-competitivefor online assignment. The more tasks that are assignedin the first step, the better this algorithm will perform.As the difference between the number of robots and thenumber of tasks that are initially presented decreases(i.e., (n ! m) $ 0), performance approaches optimal-ity, wherein all tasks are assigned in one step. Thus, al-though it is not guaranteed to produce optimal solutions,this algorithm should work well in practice, especially forST-SR-TA problems with short time horizons.Another way to approach this problem is to employ an

iterative task allocation system, such as Dias & Stentz’s(2001) price-based market. The robots would opportunis-tically exchange tasks over time, thereby modifying theirschedules. This idea is demonstrated by the multi-robotexploration system described by Zlot, Stentz, Dias &Thayer (2002). However, without knowledge of the exactcriteria used to decide when and with whom each robotwill trade, it is impossible to determine the algorithmiccharacteristics (including solution quality) of this method.

5.2.1 Variant: ALLIANCE Efficiency Problem

Parker (1995) formulated a related MRTA problem calledthe ALLIANCE Efficiency Problem (AEP). Given is a setof tasks making up a mission, and the objective is to allo-cate a subset of these tasks to each robot so as to minimizethe maximum time taken by a robot to serially executeits allocated tasks. Thus in order to solve the AEP, onemust construct a time-extended schedule of tasks for eachrobot. This problem is an instance of the class of schedul-ing problems:

R || Cmax.

Problems in this class are known to be strongly NP-hard(Garey & Johnson 1978). Parker (1995) arrived at the

8

Page 44: Silvia task

same conclusion regarding the AEP, by reduction from theNP-complete problem PARTITION.To attack the AEP, Parker (1998) used a learning ap-

proach, in which the robots learn both their utility esti-mates and their scheduling algorithms from experience.When trained for a particular task domain, this system hasthe potential to outperform the approximation algorithmdescribed above (but it is not guaranteed to do so).

5.3 ST-MR-IA: Single-task robots, multi-robot tasks, instantaneous assignment

ManyMRTA problems involve tasks that require the com-bined effort of multiple robots. In such cases, we mustconsider combined utilities of groups of robots, which arein general not sums over individual utilities; utility maybe defined arbitrarily for each potential group. For ex-ample, if a task requires a particular skill or device, thenany group of robots without that skill or device has zeroutility with respect to that task, regardless of the capabil-ities of the other robots in the group. This kind of prob-lem is significantly more difficult than the previously dis-cussed MRTA problems, which were restricted to single-robot tasks. In the multi-agent community, the ST-MR-IAproblem is referred to as coalition formation, and has beenextensively studied (Sandholm & Lesser 1997, Shehory &Kraus 1998).It is natural to think of the ST-MR-IA problem as split-

ting the set of robots into task-specific coalitions. A rel-evant concept from set theory is that of a set partition. Afamily X is a partition of a set E if and only if the ele-ments of X are mutually disjoint and their union is E:

&

x!X

= &

'

x!X

= E.

(10)

With the idea of partitions in mind, a well-knownproblem in combinatorial optimization called the (maxi-mum utility) Set Partitioning Problem, or SPP (Balas &Padberg 1976) is relevant:

Definition (Set Partitioning Problem (SPP)). Given afinite set E, a family F of acceptable subsets of E, anda utility function u : F $ R+, find a maximum-utilityfamilyX of elements in F such thatX is a partition of E.

The ST-MR-IA problem can be cast as an instance ofSPP, withE as the set of robots, F as the set of all feasiblecoalition-task pairs, and u as the utility estimate for eachsuch pair.Unfortunately, the SPP is strongly NP-hard (Garey &

Johnson 1978). Fortunately, the problem has been stud-ied in depth (Atamturk, Nemhauser & Savelsbergh 1995),

especially in the context of solving crew scheduling prob-lems for airlines (Marsten & Shepardson 1981, Hoffman& Padberg 1993). As a result, many heuristic SPP algo-rithms have been developed.It remains to be seen whether such heuristic algorithms

are applicable to MRTA problems. Some approxima-tion algorithms, including those of Hoffman & Padberg(1993) and Atamturk et al. (1995), have been shownto produce high-quality solutions to many instances ofSPP. Even with hundreds of rows/columns and usingmid-1990s workstation-class machines, these algorithmsrequire at most tens of seconds to arrive at a solution.With ever-increasing computational power available onrobots, it seems plausible that SPP approximation algo-rithms could be used to solve small- and medium-scaleinstances of the ST-MR-IA problem. To this end, a po-tentially important question is whether and how these al-gorithms can be parallelized. Shehory & Kraus (1998)showed how to implement a parallel SPP algorithm forcoalition formation in a multi-agent context. Another im-portant point is that, in order to apply certain SPP algo-rithms to ST-MR-IA problems, it may be necessary toenumerate a set of feasible coalition-task combinations.In the case that the space of such combinations is verylarge, there is a need to prune the feasible set; pruning cantake advantage of sensor-based metrics such as physicaldistance (e.g., if two robots are more than 50 meters apart,then disallow any coalitions that contain them both).

5.4 ST-MR-TA: Single-task robots, multi-robot tasks, time-extended assignment

The ST-MR-TA class of problems includes both coali-tion formation and scheduling. For example, consider theproblem of delivering a number of packages of varioussizes from a single distribution center to different desti-nations. The number of packages and their destinationsare known in advance, as is the size of each package,which determines the number of robots required to carryit. Given a pool of robots, the problem is to build a deliv-ery schedule for the packages, while guaranteeing that ateam of the appropriate size is assembled for each pack-age.To produce an optimal solution, all possible schedules

for all possible coalitions must be considered. This prob-lem isNP-hard. If the coalitions are given, with no morethan one coalition allowed for each task, the result is aninstance of a multiprocessor scheduling problem:

MPTm ||$

wjCj .

Even with two processors (MPT2 ||%

wjCj), this prob-lem is strongly NP-hard (Hoogeveen, van del Velde &Veltman 1994), as is the unweighted version (MPT2 ||

9

Page 45: Silvia task

%Cj) (Cai, Lee & Li 1998). With three processors, the

maximum finishing time version (MPT3 || Cmax) is alsostrongly NP-hard (Hoogeveen et al. 1994).A means of treating ST-MR-TA is to ignore the time-

extended component and approximate the problem as aninstance of iterated ST-MR-IA. For this purpose, a greedyapproximation algorithm akin to the one given above forthe ST-SR-TA problem can be employed. Unfortunately,the quality of such an approximation is difficult to de-termine. Another approach is to employ a leader-basedmechanism to dynamically form coalitions and build taskschedules for them, as described by Dias & Stentz (2002).However, the performance and overhead of this methodwill also be difficult, if not impossible, to predict with-out detailed information about the implementation (howmany and which robots will be leaders, how does a leaderselect among candidate coalitions, how long do coalitionspersist, etc.).

5.5 MT-SR-IA & MT-SR-TA: Multi-taskrobots, single-robot tasks

The MT-SR-IA and MT-SR-TA problems are currentlyuncommon, as they assume robots that can each concur-rently execute multiple tasks. Today’s mobile robots aregenerally actuator-poor. Their ability to affect the envi-ronment is typically limited to changing position, so theycan rarely execute more than one task at a time. How-ever, there are sensory and computational tasks that fit theMT-SR-IA or MT-SR-TA models quite well.Solving the MT-SR-IA problem is equivalent to solving

the ST-MR-IA problem (see Section 5.3), with the robotsand tasks interchanged in the SPP formulation. Likewise,the MT-SR-TA problem is equivalent to the ST-MR-TAproblem (see Section 5.4). Thus the analysis and algo-rithms provided for the multi-robot task problems also di-rectly apply here to the multi-task robot problems.

5.6 MT-MR-IA: Multi-task robots, multi-robot tasks, instantaneous assignment

When a system consists of both multi-task robots andmulti-robot tasks, the result is an instance of the MT-MR-IA problem. For example, consider the allocation ofsurveillance tasks to a team of robots in an office build-ing. Each robot continuously patrols a fixed portion ofthe building. Due to computational and/or sensory limita-tions, each robot can simultaneously detect only a limitednumber of environmental events (e.g., suspicious person,smoke, open door). Given a set of events to look for, andknowledge about where in the building each event is likelyto occur, which robots should be tasked to look for eachevent?

A relevant concept from set theory is the set cover. Afamily X is a cover of a set E if and only if the union ofelements of X is E:

'

x!X

= E. (11)

As compared with a partition (see Section 5.3), the subsetsin a cover need not be disjoint. A well-known problem incombinatorial optimization called the (minimum cost) SetCovering Problem, or SCP (Balas & Padberg 1972), isrelevant:

Definition (Set Covering Problem (SCP)). Given a fi-nite set E, a family F of acceptable subsets of E, and acost function c : F $ R+, find a minimum-cost familyX of elements in F such thatX is a cover of E.

The MT-MR-IA problem can be cast as an instance ofthe SCP, withE as the set of robots, F as the set of all fea-sible (and possibly overlapping) coalition-task pairs, andc as the cost estimate for each such pair.Though superficially similar to the SPP, the SCP is in

fact a “distant relative,” with the solution space of the SCPbeing far less constrained (Balas & Padberg 1976). Thetwo problems are similar in that the SCP is also stronglyNP-hard (Korte & Vygen 2000).Chvatal (1979) developed a greedy approximation al-

gorithm for the SCP. The competitive factor for this al-gorithm is logarithmic in the size of the largest feasiblesubset (i.e., maxf!F |f |), and the running time is poly-nomial in the number of feasible subsets (i.e., |F |). Bar-Yehuda & Even (1981) present another heuristic set cov-ering algorithm, whose competitive factor is the maxi-mum number of subsets to which any element belongs(i.e., maxe!E |{f " F : e " f}|), and whose runningtime is the sum of the sizes of the feasible subsets (i.e.,%

f!F |f |) (Korte & Vygen 2000).The important trend to note is that these heuristic algo-

rithms perform well when the space of feasible subsets islimited, and that they perform poorly in the most generalcase of the SCP, with all subsets allowed. For MRTA,this result suggests that such algorithms would best beapplied in environments in which the space of possiblecoalitions is naturally limited, as is the case with hetero-geneous and/or physically distantly separated robots. Inthe case of equally-skilled collocated robots, these algo-rithms would tend to run slowly and produce poor-qualitysolutions.To the authors’ knowledge, set covering algorithms

have not been applied to MRTA problems, and it is anopen question as to whether such an application would bebeneficial. However, Shehory & Kraus (1996) success-fully adapted and distributed Chvatal’s (1979) approxi-mation algorithm for use in multi-agent systems, which

10

Page 46: Silvia task

suggests that SCP algorithms may indeed be viable forMRTA problems.

5.7 MT-MR-TA: Multi-task robots, multi-robot tasks, time-extended assignment

We can extend the surveillance domain described in theprevious section by specifying that certain events need notbe monitored immediately or continuously, but accordingto some predefined schedule. For example, “the left wingof the building should be checked every hour for opendoors.” The result is an MT-MR-TA problem, which isan instance of a scheduling problem with multiprocessortasks and multipurpose machines:

MPTmMPMn ||$

wjCj .

This problem is stronglyNP-hard, because it includes asa special case the strongly NP-hard scheduling problemMPT2 ||

%wjCj . We are not aware of any heuristic or

approximation algorithms for this difficult problem.

6 Analysis of existing approachesPresumably because it is the simplest case of MRTA,the ST-SR-IA problem has received the most attentionfrom the research community. Having developed a for-mal framework in which to study to this MRTA problem,we can now apply it to an analysis of some of the key taskallocation architectures from the literature. In this sectionsix approaches to the ST-SR-IA problem are analyzed, fo-cusing on the following three characteristics2:

1. computation requirements

2. communication requirements

3. solution quality

These theoretical aspects of multi-robot coordinationmechanisms are vitally important to their study, compari-son, and objective evaluation, as the large-scale and long-term system behavior is strongly determined by the funda-mental characteristics of the underlying algorithm(s). Wecan derive these characteristics for existing architecturesby seeing them as solutions to the underlying utility op-timization problems that we identified in our taxonomy.First, we explain the methodology used in the analysis.

Methodology Computational requirements, or runningtime, are determined in the usual way, as the number oftimes that some dominant operation is repeated. For the

2This analysis was originally presented in Gerkey & Mataric (2003)

MRTA domain that operation is usually either a calcula-tion or comparison of utility, and running time is statedas a function ofm and n, the number of robots and tasks,respectively. Since modern robots have significant pro-cessing capabilities on board and can easily work in par-allel, in this analysis we assume that the computationalload is evenly distributed over the robots, and state therunning time as it is for each robot. For example, if eachrobot must select the task with the highest utility, thenthe running time is O(n), because each robot performs ncomparisons, in parallel. Note that this analysis does notmeasure or consider the actual running time of the util-ity calculation, in large part because that information isnot generally reported. Rather it is assumed that the util-ity calculations are computationally similar enough to bemeaningfully compared.Communication requirements are determined as the to-

tal number of inter-robot messages sent over the network.In the analysis we do not consider message sizes, onthe assumption that they are generally small (e.g., sin-gle scalar utility values) and approximately the same fordifferent algorithms. Further, we assume that a perfectshared broadcast communication medium is used and thatmessages are always broadcast, rather than unicast. So if,for example, each robot must tell every other robot its ownhighest utility value, then the overhead is O(m), becauseeach robot makes a single broadcast.Solution quality is reported as a competitive factor,

which bounds an algorithm’s performance as a functionof the optimal solution (Section 4). The competitive fac-tor for an architecture is determined by mapping its taskallocation algorithm onto the underlying assignment prob-lem. For any given task allocation architecture, this map-ping could be arbitrarily complex and not necessarily in-formative. Fortunately, existing MRTA architectures tendto implement either the Greedy algorithm or a close vari-ant. By identifying the allocation algorithm as such, wecan put a lower bound on its performance, and thus gainsome insight into how the architecture can be expectedto perform, independent of the particular application do-main.

Results & discussion Next, six MRTA architecturesthat have been validated on either physical or simulatedrobots are analyzed. Three of the architectures solve theiterated assignment problem and the other three solve theonline assignment problem. While there are a great manyarchitectures in the literature, we have attempted to gathera set of approaches that is representative of the work todate.Of the iterated assignment architectures, the first is

ALLIANCE (Parker 1998), one of the earliest demon-strated approaches to MRTA. This behavior-based archi-

11

Page 47: Silvia task

Name Computation Communication Solution/ iteration / iteration quality

ALLIANCE3 O(mn) O(m) at least(Parker 1998) 2-competitiveBLE O(mn) O(mn) 2-competitive(Werger & Mataric 2001)M+ O(mn) O(mn) 2-competitive(Botelho & Alami 1999)

Table 1: Summary of selected iterated assignment architectures for MRTA. Shown here for each architecture are thecomputational and communication requirements, as well as solution quality.

Name Computation Communication Solution/ task / task quality

MURDOCH O(1) / bidder O(n) 3-competitive(Gerkey & Mataric 2002b) O(n) / auctioneerFirst-price auctions O(1) / bidder O(n) at least(Dias & Stentz 2001) O(n) / auctioneer 3-competitiveDynamic role assignment O(1) / bidder O(n) at least(Chaimowicz et al. 2002) O(n) / auctioneer 3-competitive

Table 2: Summary of selected online assignment architectures for MRTA. Shown here for each architecture are thecomputational and communication requirements, as well as solution quality.

tecture allocates tasks by maintaining, for each robot, lev-els of impatience and acquiescence concerning the avail-able tasks. These motivation factors are combined toform, in effect, a utility estimate for each (robot, task)pair. Another behavior-based architecture is Werger &Mataric’s (2001) Broadcast of Local Eligibility (BLE),which is a distributed version of the well-known Sub-sumption Architecture (Brooks 1986). As described inSection 5.1.1, BLE works as follows: at a fixed rate (1Hz),the robots compute and broadcast to each other their util-ity estimates for all tasks; allocation is performed aftereach broadcast with the Greedy algorithm. Another archi-tecture that employs the Greedy algorithm is M+ (Botelho& Alami 1999), whose use of auctions represents the firstmarket-based approach to MRTA (or at least the first thatwas motivated with economic ideas). Reassignment oftasks is allowed in all three architectures, although the fre-quency of reassignment may vary. For example, in BLEreassignment occurs almost continuously, but in M+ reas-signment occurs only when a new task becomes available.Of the online assignment architectures, the first is

MURDOCH (Gerkey &Mataric 2002b), which uses a first-price auction to assign each task, and does not allow re-assignment. As stated in Section 5.1.2, this approach is

3In addition to solving the ST-SR-IA problem, the ALLIANCE ar-chitecture is also capable of building time-extended task schedules inorder to solve a form of the ST-SR-TA problem (see Section 5.2.1).

also an implementation of the Greedy algorithm. The tworemaining architectures, from Dias & Stentz (2001) andChaimowicz et al. (2002), also assign tasks with first-priceauctions, but allow (in some circumstances) later reassign-ment.Tables 1 & 2 summarize the results for the iterated

assignment architectures and online assignment architec-tures, respectively. Perhaps the most significant trend inthese results is how similar the architectures look whenexamined in the manner. For example, the iterated ar-chitectures listed in Table 1, which assign all availabletasks simultaneously, exhibit almost identical algorith-mic characteristics. Only the ALLIANCE architecture(Parker 1998) shows any difference; in this case the de-crease in communication overhead is achieved by hav-ing each robot internally model the fitness of the oth-ers, thereby effectively distributing the utility calcula-tions. More striking are the results in Table 2, whichlists architectures that assign tasks in a sequential man-ner: with respect to computational and communicationrequirements, these architectures are identical. In termsof solution quality, Dias & Stentz’s (2001) and Chaimow-icz et al.’s (2002) approaches, which allow reassignmentof tasks, can potentially perform better than MURDOCH.These results are particularly interesting because they

suggest that there is some common methodology under-lying many existing approaches to MRTA. This trend is

12

Page 48: Silvia task

difficult or impossible to discern from reading the tech-nical papers describing the work, as each architecture isdescribed in different terms, and validated in a differenttask domain. However, with the analysis described here,fundamental similarities of the various architectures be-come obvious. These similarities are encouraging becausethey suggest that, regardless of the details of the robots ortasks in use, the various authors are all studying a com-mon, fundamental problem in autonomous coordination.As a corollary, there is now a formal grounding for thebelief that these ad hoc architectures may have propertiesthat allow them to be generalized and applied widely.Of course, the described analysis does not capture all

relevant aspects of the systems under study. For example,in the ALLIANCE architecture, the robots’ computationalload is increased to handle modeling of other robots, butthis analysis does not consider that extra load. Such de-tails, which are currently not widely discussed in the lit-erature, will likely become more important as the fieldmoves toward improved cross-evaluation of solutions.In addition to enabling evaluation, this kind of anal-

ysis can be used to explain why certain solutions workin practice. For example, the online assignment archi-tectures listed in Table 2 are all economically-inspired,built around task auctions. The designers of such archi-tectures generally justify their approach with a loose anal-ogy to the efficiency of the free market as it is used byhumans. With a formal analysis, it is possible to gaina clearer understanding of why auction-based allocationmethods work in practice. Specifically, is well knownthat synthetic economic systems can be used to solve avariety of optimization problems. As explained in Sec-tion 5.1, an appropriately constructed price-based mar-ket, at equilibrium (i.e., when the prices are such thatno two utility-maximizing robots would select the sametask), produces optimal assignments. The previously de-scribed economically-inspired architectures approximatesuch a market to varying degrees.

7 Other problemsAlthough the taxonomy given in the previous sectionscovers many MRTA domains, several potentially impor-tant problems are excluded. Next we describe some prob-lem domains that are not captured by the taxonomy.

7.1 Interrelated utilitiesConsider the problem of assigning target points to a teamof robots that are cooperatively exploring an unknown en-vironment. Many targets (e.g., the frontiers of Yamauchi(1998)) may be known at one time, and so it is possible

to build a schedule of targets for each robot. Unfortu-nately, this problem is not an instance of ST-SR-TA, be-cause the cost for a robot to visit target C depends onwhether that robots first visits target A or target B. In-stead, this problem is an instance of the multiple travelingsalesperson problem (MTSP); even in the restricted caseof one salesperson, MTSP is strongly NP-hard (Korte &Vygen 2000). If, as is often the case with exploration, it ispossible to discover new targets over time, then the prob-lem is an instance of the dynamic MTSP, which is clearlyat least as difficult as the classical MTSP.

Given the difficulty of the multi-robot exploration prob-lem, it is not surprising that researchers have not at-tempted to solve it directly or exactly. A heuristic ap-proximation is offered by Zlot et al. (2002), who use TSPheuristics to build target schedules and derive costs thatare used in Dias & Stentz’s (2001) market-based task allo-cation architecture. When a robot discovers a new target,it inserts the new target into its schedule, but retains theoption of later auctioning the target off to another, closerrobot.

The multi-robot exploration problem is an example ofa larger class of problems, in which a robot’s utility fora task may depend on which other tasks that robot exe-cutes. These problems in turn form part of another, moregeneral class of problems in which a robot’s utility for atask may depend on which other tasks any robot executes.That is, each robot-task utility can depend on the over-all allocation of tasks to robots. Such interrelated utilitiescan sometimes be tractably captured with factored Par-tially Observable Markov Decision Processes (POMDPs),assuming that a world model is available (Guestrin, Koller& Parr 2001).

For mobile robots, this situation can arise any time thatphysical interference contributes significantly to task per-formance. For example, consider a multi-robot resourcetransportation problem in which each robot must choosewhich of a predetermined number of source-sink roads totravel. The decision of which road to travel should takeinto account the congestion caused by other robots. Tak-ing the position that interference effects are difficult orimpossible to adequately model a priori, Dahl, Mataric& Sukhatme (2002) developed a reinforcement learningapproach to the multi-robot resource transportation prob-lem. The robots do not communicate with each other di-rectly, but rather through physical interactions, with eachrobot maintaining and updating an estimate of the utilityfor each available road. This approach was shown to pro-duce higher-quality solutions than those produced withoutlearning, and added no communication overhead.

13

Page 49: Silvia task

7.2 Task constraintsIn addition to an assumption of independent utilities, ourtaxonomy also assumes independent tasks. There mayinstead be constraints among the tasks, such as sequen-tial or parallel execution. In principle, each set of taskswith such constraints could be phrased as a single mono-lithic task that requires multiple robots. The allocation ofthese larger tasks could then be described by the presentedtaxonomy (e.g., ST-MR-IA). Unfortunately, the difficultproblem of reasoning about task constraints is not re-moved, but simply shifted into the utility estimation foreach potential multi-robot team. In general, our analysiswill not suffice in the presence of constraints among tasks.Although the topic of job constraints is addressed by

the scheduling literature (Brucker 1998), the addition ofsuch constraints generally increases problem difficulty,and tractable algorithms exist for only the simplest kindsof constraints. A possible way to approach this prob-lem is with techniques for dynamic constraints satisfac-tion (Modi, Jung, Tambe, Shen & Kulkarni 2001).

8 SummaryIn the field of mobile robotics, the study of multi-robotsystems has grown significantly in size and importance.Having solved some of the basic problems concerningsingle-robot control, many researchers have shifted theirfocus to the study of multi-robot coordination. Thereare by now a plethora of examples of demonstrated co-ordinated behavior in multi-robot systems, and almost asmany proposed coordination architectures. However, de-spite more than a decade of research, the field so far lacksa theoretical foundation that can explain or predict the be-havior of a multi-robot system. Our goal in this paper hasbeen to provide a candidate framework for studying suchsystems.The word “coordination” is somewhat imprecise, and

has been used inconsistently in the literature. In orderto be precise about the problem with which we are con-cerned, we defined a smaller problem: multi-robot taskallocation (MRTA). That is, given some robots and sometasks, which robot(s) should execute which task(s)? Thisrestricted problem is both theoretically and practically im-portant, and is supported by the significant body of exist-ing work that focuses on MRTA, in one form or another.To date, the majority of research in MRTA has been ex-

perimental in nature. The standard procedure, followedby a large number of researchers, has been to constructa MRTA architecture and then validate it in one or moreapplication domains. This proof-of-concept method hasled to the proposal of many MRTA architectures, eachof which has been experimentally validated to a greater

or lesser extent, sometimes in simulation and sometimeswith physical robots. These research efforts are unde-niably useful, as they demonstrate that successful multi-robot coordination is possible, even in relatively complexenvironments. However, to date it has not been possible todraw general conclusions regarding the underlyingMRTAproblems, or to establish a prescriptive strategy that woulddictate how to achieve task allocation in a given multi-robot system.We view MRTA problems as fundamentally organiza-

tional in nature, in that the goal is to allocate limitedresources in such a way as to efficiently achieve sometask(s). In this paper we have shown how MRTA prob-lems can be studied in a formal manner by adapting torobotics some of the theory developed in relevant disci-plines that study organizational and optimization prob-lems. These disciplines include operations research, eco-nomics, scheduling, network flows, and combinatorial op-timization.Using such connections to relevant optimization the-

ory, we have presented in this paper a formal analysis ofMRTA problems. We have provided characterizations ofa wide range of such problems, in the larger context ofa taxonomy. For the easier problems, we have providedprovably optimal algorithms that can be used in place ofcommonly-employed ad hoc or greedy solutions. For themore difficult problems, we have, wherever possible, pro-vided suggestions toward their heuristic solution. Thus,this work can be used to aid further research into multi-robot coordination by allowing for the formal classifica-tion of MRTA problems, and by sometimes prescribingcandidate solutions.The presented MRTA formalism is very general, in that

it relies only on domain-independent theory and tech-niques. Thus, for example, the taxonomy given in Sec-tion 5 should apply equally well in multi-agent and multi-robot systems. However, in exchange for such generality,this formalism is only capable of providing coarse char-acterizations of MRTA problems and their proposed so-lutions. Consider the analysis showing that MURDOCH,as an implementation of the canonical Greedy algorithm,is 3-competitive for the online assignment problem. Thiskind of competitive factor gives an algorithm’s worst-casebehavior, which may be quite different from its average-case behavior. In this respect, the bounds established forexisting MRTA architectures, in terms of computationaloverhead, communication overhead, and solution quality,are relatively loose.One way to tighten these bounds is to add domain-

specific information to the formalism. By capturing andembedding models of how real MRTA domains behaveand evolve over time, it should be possible to make moreaccurate predictions about algorithmic performance. For

14

Page 50: Silvia task

example, while the classical theory of the OAP makes noassumptions about the nature of the utility matrices thatform the input, MRTA problems are likely to exhibit sig-nificant structure in their utility values. Far from ran-domly generated, utility values generally follow one ofa few common models, determined primarily by the kindof sensor data that are used in estimating utility. If only“local” sensor information is used (e.g., can the robot cur-rently see a particular target, and if so, how close is it?),then utility estimates tend to be strongly bifurcated (e.g.,a robot will have very high utility for those targets that itcan see, and zero utility for all others). On the other hand,if “global” sensor information is available (e.g., how closeis the robot to a goal location?), then utility estimates tendto be smoother (e.g., utility will fall off smoothly in spaceaway from the goal). A promising avenue for future re-search would be to characterize this “utility landscape”as it is encountered in MRTA domains, and then classifydifferent MRTA problems according to the shapes of theirlandscapes, and make predictions about, for example, howwell a greedy assignment algorithm should be expected towork, as opposed to a more costly optimal assignment al-gorithm.

AcknowledgmentsThe research reported here was conducted at the Interac-tion Lab, part of the Center for Robotics and EmbeddedSystems (CRES) at the University of Southern California.This work was supported in part by the Intel Foundation,DARPA Grant DABT63-99-1-0015 (MARS), and Officeof Naval Research Grants N00014-00-1-0638 (DURIP)and N00014-01-1-0354. We thank Herbert Dawid, An-drew Howard, Richard Vaughan, and Michael Wellmanfor their insightful comments.

ReferencesAgassounon, W. & Martinoli, A. (2002), A Macroscopic Model

of an Aggregation Experiment using Embodied Agents inGroups of Time-Varying Sizes, in ‘Proc. of the IEEE Conf.on System, Man and Cybernetics (SMC)’, Hammamet,Tunisia, pp. 250–255.

Ahuja, R. K., Magnanti, T. L. & Orlin, J. B. (1993), NetworkFlows: Theory, Algorithms, and Applications, PrenticeHall, Upper Saddle River, New Jersey.

Alur, R., Courcoubetis, C., Halbwachs, N., Henzinger, T. A.,Ho, P.-H., Nicollin, X., Olivero, A., Sifakis, J. & Yovine,S. (1995), ‘The algorithmic analysis of hybrid systems’,Theoretical Computer Science 138(1), 3–34.

Atamturk, A., Nemhauser, G. & Savelsbergh, M. (1995), ‘ACombined Lagrangian, Linear Programming and Implica-

tion Heuristic for Large-Scale Set Partitioning Problems’,J. of Heuristics 1, 247–259.

Avis, D. (1983), ‘A Survey of Heuristics for the WeightedMatching Problem’, Networks 13, 475–493.

Balas, E. & Padberg, M. W. (1972), ‘On the Set-Covering Prob-lem’, Operations Research 20(6), 1152–1161.

Balas, E. & Padberg, M.W. (1976), ‘Set Partitioning: A Survey’,SIAM Review 18(4), 710–760.

Bar-Yehuda, R. & Even, S. (1981), ‘A linear-time approximationalgorithm for the weighted vertex cover problem’, J. of Al-gorithms 2, 198–203.

Bertsekas, D. P. (1990), ‘The Auction Algorithm for Assignmentand Other Network Flow Problems: A Tutorial’, Interfaces20(4), 133–149.

Botelho, S. C. & Alami, R. (1999), M+: a scheme for multi-robot cooperation through negotiated task allocation andachievement, in ‘Proc. of the IEEE Intl. Conf. on Roboticsand Automation (ICRA)’, Detroit, Michigan, pp. 1234–1239.

Brooks, R. A. (1986), ‘A robust layered control system for a mo-bile robot’, IEEE J. of Robotics and Automation 2(1), 14–23.

Brucker, P. (1998), Scheduling Algorithms, 2nd edn, Springer-Verlag, Berlin.

Bruno, J. L., Coffman, E. G. & Sethi, R. (1974), ‘Scheduling In-dependent Tasks To Reduce Mean Finishing Time’, Com-munications of the ACM 17(7), 382–387.

Cai, X., Lee, C.-Y. & Li, C.-L. (1998), ‘Minimizing Total Com-pletion Time in Two-Processor Task Systems with Pre-specifi ed Processor Allocations’, Naval Research Logistics45(2), 231–242.

Cao, Y. U., Fukunaga, A. S. & Kahng, A. (1997), ‘Coopera-tive Mobile Robotics: Antecedents and Directions’, Au-tonomous Robots 4(1), 7–27.

Castelpietra, C., Iocchi, L., Nardi, D., Piaggio, M., Scalzo, A.& Sgorbissa, A. (2001), Communication and Coordina-tion among heterogeneous Mid-size players: ART99, inP. Stone, T. Balch & G. Kraetzschmar, eds, ‘RoboCup2000, LNAI 2019’, Springer-Verlag, Berlin, pp. 86–95.

Chaimowicz, L., Campos, M. F. M. & Kumar, V. (2002), Dy-namic Role Assignment for Cooperative Robots, in ‘Proc.of the IEEE Intl. Conf. on Robotics and Automation(ICRA)’, Washington, DC, pp. 293–298.

Chien, S., Barrett, A., Estlin, T. & Rabideau, G. (2000), Acomparison of coordinated planning methods for cooper-ating rovers, in ‘Proc. of Autonomous Agents’, Barcelona,Spain, pp. 100–101.

Chvatal, V. (1979), ‘A greedy heuristic for the set cover prob-lem’,Mathematics of Operations Research 4, 233–235.

Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1997), Introduc-tion to Algorithms, MIT Press, Cambridge, Massachusetts.

15

Page 51: Silvia task

Dahl, T. S., Mataric, M. J. & Sukhatme, G. S. (2002), Adaptivespatio-temporal organization in groups of robots, in ‘Proc.of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Sys-tems (IROS)’, Lausanne, Switzerland, pp. 1044–1049.

Deneubourg, J.-L., Theraulaz, G. & Beckers, R. (1991), Swarm-made architectures, in ‘Proc. of the European. Conf. onArtifi cial Life (ECAL)’, Paris, France, pp. 123–133.

Dertouzos, M. L. & Mok, A. K. (1983), ‘Multiprocessor On-Line Scheduling of Hard-Real-Time Tasks’, IEEE Trans-actions on Software Engineering 15(12), 1497–1506.

Dias, M. B. & Stentz, A. (2001), A Market Approach to Multi-robot Coordination, Technical Report CMU-RI-TR-01-26,The Robotics Institute, Carnegie Mellon University, Pitts-burgh, Pennsylvania.

Dias, M. B. & Stentz, A. (2002), Opportunistic Optimizationfor Market-Based Multirobot Control, in ‘Proc. of theIEEE/RSJ Intl. Conf. on Intelligent Robots and Systems(IROS)’, Lausanne, Switzerland, pp. 2714–2720.

Donald, B., Jennings, J. & Rus, D. (1997), ‘Information invari-ants for distributed manipulation’, The Intl. J. of RoboticsResearch 16(5), 673–702.

Dudek, G., Jenkin, M. & Milios, E. (2002), A Taxonomy ofMultirobot Systems, in T. Balch & L. Parker, eds, ‘RobotTeams: From Diversity to Polymorphism’, A.K. Peters,Natick, Massachusetts, pp. 3–22.

Emery, R., Sikorski, K. & Balch, T. (2002), Protocols for Collab-oration, Coordination, and Dynamic Role Assignment in aRobot Team, in ‘Proc. of the IEEE Intl. Conf. on Roboticsand Automation (ICRA)’, Washington, DC, pp. 3008–3015.

Fukuda, T., Nakagawa, S., Kawauchi, Y. & Buss, M. (1988),Self organizing robots based on cell structures – CEBOT,in ‘Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robotsand Systems (IROS)’, Victoria, British Columbia, pp. 145–150.

Gale, D. (1960), The Theory of Linear Economic Models,McGraw-Hill Book Company, Inc., New York.

Garey, M. R. & Johnson, D. S. (1978), ‘“Strong” NP-Completeness Results: Motivation, Examples, and Impli-cations’, J. of the ACM 25(3), 499–508.

Gat, E. (1998), Three-Layer Architectures, in D. Kortenkamp,R. P. Bonasso & R. Murphy, eds, ‘Artifi cial IntelligenceandMobile Robots: Case Studies of Successful Robot Sys-tems’, AAAI Press, Menlo Park, California, pp. 195–210.

Gerkey, B. P. & Mataric, M. J. (2002a), A market-based formu-lation of sensor-actuator network coordination, in ‘Proc.of the AAAI Spring Symp. on Intelligent Embedded andDistributed Systems’, Palo Alto, California, pp. 21–26.

Gerkey, B. P. & Mataric, M. J. (2002b), ‘Sold!: Auction meth-ods for multi-robot coordination’, IEEE Transactions onRobotics and Automation 18(5), 758–768.

Gerkey, B. P. & Mataric, M. J. (2003), Multi-Robot Task Allo-cation: Analyzing the Complexity and Optimality of KeyArchitectures, in ‘Proc. of the IEEE Intl. Conf. on Roboticsand Automation (ICRA)’, Taipei, Taiwan, pp. 3862–3868.

Guestrin, C., Koller, D. & Parr, R. (2001), Multiagent Plan-ning with Factored MDPs, in ‘Proc. of Advances in Neu-ral Information Processing Systems (NIPS)’, Vancouver,Canada, pp. 1523–1530.

Hoffman, K. L. & Padberg, M.W. (1993), ‘Solving Airline CrewScheduling Problems by Branch-and-Cut’, ManagementScience 39(6), 657–682.

Hoogeveen, J., van del Velde, S. & Veltman, B. (1994), ‘Com-plexity of scheduling multiprocessor tasks with prespeci-fi ed processor allocations’, Discrete Applied Mathematics55, 259–272.

Jennings, J. S. & Kirkwood-Watts, C. (1998), DistributedMobile Robotics by the Method of Dynamic Teams, inT. Luth, P. Dario & H. Worn, eds, ‘Distributed Au-tonomous Robotic Systems 3’, Springer-Verlag, NewYork, pp. 47–56.

Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. (1998),‘Planning and Acting in Partially Observable StochasticDomains’, Artificial Intelligence 101(1–2), 99–134.

Kalyanasundaram, B. & Pruhs, K. (1993), ‘Online WeightedMatching’, J. of Algorithms 14, 478–488.

Klavins, E. (2003), Communication Complexity of Multi-RobotSystems, in J.-D. Boissonnat, J. Burdick, S. Hutchinson &K. Goldberg, eds, ‘Algorithmic Foundations of RoboticsV’, Springer-Verlag, New York, pp. 275–292.

Korte, B. & Vygen, J. (2000), Combinatorial Optimization:Theory and Algorithms, Springer-Verlag, Berlin.

Kube, C. R. & Zhang, H. (1993), ‘Collective robotics: Fromsocial insects to robots’, Adaptive Behavior 2(2), 189–219.

Kuhn, H. W. (1955), ‘The Hungarian Method for the As-signment Problem’, Naval Research Logistics Quarterly2(1), 83–97.

Lerman, K. &Galstyan, A. (2002), ‘Mathematical Model of For-aging in a Group of Robots: Effect of Interference’, Au-tonomous Robots 13(2), 127–141.

Marsten, R. E. & Shepardson, F. (1981), ‘Exact Solutionof Crew Scheduling Problems Using the Set Partition-ing Model: Recent Successful Applications’, Networks11, 165–177.

Mason, M. T. (1986), ‘Mechanics and planning of manipula-tor pushing operations’, The Intl. J. of Robotics Research5(3), 53–71.

Mataric, M. J. (1992), Designing Emergent Behaviors: FromLocal Interactions to Collective Intelligence, in J.-A.Meyer, H. Roitblat & S. Wilson, eds, ‘From Animals toAnimats 2, Second International Conference on Simula-tion of Adaptive Behavior (SAB-92)’, MIT Press, pp. 432–441.

Modi, J., Jung, H., Tambe, M., Shen, W.-M. & Kulkarni,S. (2001), A Dynamic Distributed Constraint SatisfactionApproach to Resource Allocation, in T.Walsh, ed., ‘Princi-ples and Practice of Constraint Programming – CP 2001’,Springer-Verlag, New York, pp. 685–700.

16

Page 52: Silvia task

Murata, T. (1989), ‘Petri Nets: Properties, Analysis, and Appli-cations’, Proc. of the IEEE 77(4), 541–580.

Østerg ard, E. H., Mataric, M. J. & Sukhatme, G. S. (2001),Distributed multi-robot task allocation for emergency han-dling, in ‘Proc. of the IEEE/RSJ Intl. Conf. on IntelligentRobots and Systems (IROS)’, Wailea, Hawaii, pp. 821–826.

Parker, L. E. (1994), Heterogeneous Multi-Robot Cooperation,PhD thesis, MIT EECS Department.

Parker, L. E. (1995), L-ALLIANCE: AMechanism for AdaptiveAction Selection in Heterogeneous Multi-Robot Teams,Technical Report ORNL/TM-13000, Oak Ridge NationalLaboratory, Knoxville, Tennessee.

Parker, L. E. (1998), ‘ALLIANCE: An architecture for fault-tolerant multi-robot cooperation’, IEEE Transactions onRobotics and Automation 14(2), 220–240.

Parker, L. E. (1999), ‘Cooperative Robotics for Multi-TargetObservation’, Intelligent Automation and Soft Computing5(1), 5–19.

Pearce, D. W., ed. (1999), The MIT Dictionary of ModernEconomics, 4th edn, The MIT Press, Cambridge, Mas-sachusetts.

Sandholm, T. W. & Lesser, V. R. (1997), ‘Coalitions amongcomputationally bounded agents’, Artificial Intelligence94(1), 99–137.

Shehory, O. & Kraus, S. (1996), Formation of overlappingcoalitions for precedence-ordered task-execution amongautonomous agents, in ‘Proc. of the Intl. Conf. on MultiAgent Systems (ICMAS)’, Kyoto, Japan, pp. 330–337.

Shehory, O. & Kraus, S. (1998), ‘Methods for task allocation viaagent coalition formation’, Artificial Intelligence 101(1–2), 165–200.

Simon, H. A. (2001), The Sciences of the Artificial, 3rd edn, MITPress, Cambridge, Massachusetts.

Spletzer, J. R. & Taylor, C. J. (2001), A Framework for Sen-sor Planning and Control with Applications to VisionGuided Multi-robot Systems, in ‘Proc. of Computer Visionand Pattern Recognition Conf. (CVPR)’, Kauai, Hawaii,pp. 378–383.

Stone, P. & Veloso, M. (1999), ‘Task Decomposition, DynamicRole Assignment, and Low-Bandwidth Communicationfor Real-Time Strategic Teamwork’, Artificial Intelligence110(2), 241–273.

Vail, D. & Veloso, M. (2003), Dynamic Multi-Robot Coordina-tion, in A. Schultz et al., eds, ‘Multi-Robot Systems: FromSwarms to Intelligent Automata, Volume II’, Kluwer Aca-demic Publishers, the Netherlands, pp. 87–98.

Weigel, T., Auerback, W., Dietl, M., Dumler, B., Gutmann, J.-S., Marko, K., Muller, K., Nebel, B., Szerbakowski, B. &Thiel, M. (2001), CS Freiburg: Doing the Right Thing ina Group, in P. Stone, T. Balch & G. Kraetzschmar, eds,‘RoboCup 2000, LNAI 2019’, Springer-Verlag, Berlin,pp. 52–63.

Werger, B. B. & Mataric, M. J. (2001), Broadcast of LocalEligibility for Multi-Target Observation, in L. E. Parker,G. Bekey & J. Barhen, eds, ‘Distributed AutonomousRobotic Systems 4’, Springer-Verlag, New York, pp. 347–356.

Yamauchi, B. (1998), Frontier-Based Exploration Using Multi-ple Robots, in ‘Proc. of Autonomous Agents’, Minneapo-lis, Minnesota, pp. 47–53.

Zlot, R., Stentz, A., Dias, M. B. & Thayer, S. (2002), Multi-Robot Exploration Controlled by a Market Economy, in‘Proc. of the IEEE Intl. Conf. on Robotics and Automation(ICRA)’, Washington, DC, pp. 3016–3023.

17

Page 53: Silvia task

Université Libre de BruxellesInstitut de Recherches Interdisciplinaires et de Développements en Intelligence Artificielle

E!cient Multi-Foraging in Swarm Robotics

Alexandre Campo

IRIDIA – Technical Report Series

Technical Report No.

TR/IRIDIA/2006-027

October 2006

Page 54: Silvia task

IRIDIA – Technical Report SeriesISSN 1781-3794

Published by:IRIDIA, Institut de Recherches Interdisciplinaires

et de Developpements en Intelligence Artificielle

Universite Libre de BruxellesAv F. D. Roosevelt 50, CP 194/61050 Bruxelles, Belgium

Technical report number TR/IRIDIA/2006-027

Revision history:TR/IRIDIA/2006-027.001 September 2006

The information provided is the sole responsibility of the authors and does not necessarilyreflect the opinion of the members of IRIDIA. The authors take full responsability forany copyright breaches that may result from publication of this paper in the IRIDIA –Technical Report Series. IRIDIA is not responsible for any use that might be made ofdata appearing in this publication.

Page 55: Silvia task

E!cient multi-foraging in swarm robotics

Alexandre Campo!

October 2006

Abstract

In the multi-foraging task under study, a group of robots has to e!ciently retrieve twodi"erent types of prey to a nest. We define e!ciency using a concept of energy that is spentby the robots when they leave the nest to explore the environment and gained when a preyis successfully retrieved to the nest.

The goal of this study is to identify the characteristics of an e!cient multi-foraging be-haviour. We design and validate a mathematical model that is used to predict the optimalbehaviour. This model allows us to evaluate the performance of a group of robots on anabsolute scale. We design a decision algorithm and study its performance and adaptivity in awide range of experimental situations. This algorithm takes into account information aboutthe number of prey encountered but also about the number of robots encountered, so thata suitable number of robots are allocated to foraging while the remaining robots rest at thenest and spare energy.

Keywords: swarm-robotics, multi-foraging, mathematical modelling.

1 Introduction

Within collective robotics, swarm robotics is a relatively new approach to the coordination of asystem composed of a large number of autonomous robots. The coordination among the robotsis achieved in a self-organised manner: the collective behaviour of the robots is the result of localinteractions among robots, and between the robots and the environment. The concept of localityrefers to a situation in which a robot alone can not perceive the whole system. Each single robottypically has limited sensing, acting and computing abilities. The strength of swarm robotics liesin the properties of robustness, adaptivity and scalability of the group [DSahin04].

Foraging is a classical metaphor used in swarm robotics. In foraging, a group of robots has topick up objects that are scattered in the environment. The foraging task can be decomposed inan exploration sub-task followed by a transport sub-task. Foraging can be applied to a wide rangeof useful tasks. Examples of applications are toxic waste cleanup, search and rescue, deminingand collection of terrain samples. Central place foraging is a particular type of foraging task inwhich robots must gather objects in a central place. Borrowing the terminology from biology, thecentral place is also called the nest and the objects are called prey.

Multi-foraging is a variation of the foraging task in which di!erent types of objects to collectare considered [Bal99]. These di!erent types of objects can be concurrently and independentlycollected by the individuals and can have di!erent properties. Multi-foraging adds a level of com-plexity with respect to the traditional foraging task as it may be necessary for the individuals tochoose which prey to take, and when.

The study of the e"ciency of foragers has first been the concern of biologists. In his semi-nal article [Cha76], Charnov exposes the fundamental hypothesis that gives birth to the field ofoptimal foraging. The hypothesis is that evolution has shaped individual behaviours of foraging

!Corresponding author: [email protected]

1

Page 56: Silvia task

2 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

animals so as to maximize the net energy intake. Three decades later, roboticists try to identifyhow robots should cooperate in order to forage e"ciently. The e"ciency has been defined in sev-eral ways: in biology, researchers use the term energy and measure weights of food and animalsto quantify energy spent and gained. In robotics, the vocabulary is more flourishing, due to theconnections with the fields of game theory and scheduling. Terms such as reward, income, benefitmay have been used [UB04, LDD04, LJGM06, LWS+06]. For the sake of simplicity, we will usethe term energy, as in biology. Foraging e"ciently is thus a quest to optimize the energy of agroup of foraging robots. Robotics researchers often consider that energy is spent when robotsmove during exploration and that energy is gained when a prey is successfully retrieved to thenest [LWS+06].

We focus on a specific case of multi-foraging in which there are only two types of prey that haveto be retrieved to the nest. Robots can rest at the nest and in this way spare energy. Moreover, thespatial aspect of the task is neglected as the prey do not have specific locations in the environmentbut rather densities in the environment. The exploration mechanism used by the robots to findprey is a random walk. Therefore robots discover prey in the environment at a given rate.

Our objective is to identify the characteristics of an individual behaviour that leads the groupof foraging robots to have an e"cient collective behaviour. To achieve this objective, we firstdesign and validate a mathematical model of multi-foraging. Mathematical modeling of roboticexperiments is a methodology [SS97] [SSYA98] [KAKG02]. Mathematical models are opposedto individual based models (IBMs) [ME03]. In IBMs, each robot is represented as well as theenvironment. Mathematical models are analytic description of the evolution of a system, in whichthe individuals of a system are not represented separately. Macroscopic models are faster than theircounterparts because their computation time does not depend on the number of individuals. Theycan be used as optimization tools: Ijspeert et al. [IMBG01] have used a stick pulling experimentto demonstrate how the behaviour of the robots could be made e"cient. Within the limits of themathematical tools available, it is also possible to draw conclusions on the dynamics and intrinsicproperties of the system.

The model we devise predicts with a very good confidence the optimal behaviour of the robots,henceforth the maximum amount of energy that a group of robots can accumulate during anexperiment. We use the model as a yardstick to evaluate the performance of the group of robotsand test di!erent behavioural rules.

Based on simplified equations, we introduce a decision algorithm to control the behaviour ofthe robots. To evaluate the performance of that algorithm, we run simulations using a set of2160 di!erent experimental configurations. These configurations are obtained by varying the pa-rameters of the experiment. The e"cient behaviour of robots is not necessarily the same in oneconfiguration or the other. We also test the adaptivity of the algorithm by changing drasticallythe configuration of experiments.

In Section 2 we detail the experimental setup and the controller of the robots. Section 3 isdevoted to the description, analysis and validation of the mathematical model. Section 4 presentsthe decision algorithm and the evaluation of its performance and adaptivity using the predictionsof the mathematical model. Section 5 concludes the paper with a discussion of the results andsome ideas for future work.

2 Methods

2.1 Experimental setup

Environment - The environment is a circular arena of 1.20 meters of radius (see Figure 1(a)).Robots are randomly scattered on it at the beginning.

Page 57: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2006-027 3

Nest - A circular nest is located in the center of the arena. Robots can locate the nest anywherein the arena thanks to a lamp suspended above it, signaling it’s position. In order to avoid over-crowding, the nest has a structure of three imbricated rings with di!erent gray-levels, as presentedin Figure 1(b). The limit of the most inner ring defines where robots can rest. By disposing robotsaround the center, an aggregation of resting robots is avoided. Such a structure could preventrobots surrounded by others to leave freely the nest. The second ring defines where a robot cansafely release a prey, with good confidence that it is dropped inside the nest. Indeed, as robotshave limited perception they cannot distinguish whether the retrieved prey is inside the nest oronly half inside. Finally, the limit of the outermost ring defines the boundary of the nest.

(a)

Center of the nest

First markrobots can rest

Second markrobots can drop

Nest markthe boundary of the nest

Prey

Prey

(b)

Figure 1: (a) Setup of the multi-foraging experiment. The environment is a circular arena. Thenest is circular and centered. Robots and prey are initially scattered randomly. (b) Close-up ofthe nest showing it’s structure of three imbricated rings. The structure is designed to minimizeinterferences (overcrowding phenomena) among robots in the nest. In addition, the structurepermits robots to have a good confidence that prey are dropped inside the nest.

Prey - Prey are introduced in the environment at random locations around the nest, at a fixeddistance of it. New prey appear with a constant rate per time unit. Prey have a lifetime, i.e.they disappear at a constant rate. They are also removed when they are inside the nest. Theyhave a weight and friction that define the time required for being retrieved. An amount of energyis associated to a prey. and is attributed to the group of robots once it is delivered in the nest.Prey of the same type share all their characteristics (time to retrieve, colour, energy, lifetime andincoming rate). There are only two di!erent types of prey in the experiments.

Robots - The simulated robots have the same characteristics as s-bots from the swarm-botsproject [DTG+05]. The sensors we rely on are the ground sensors to perceive whether robotsare outside or inside the nest, and in the latter case on which ring. Infrared sensors are used forcollision avoidance with walls and other robots. The camera is employed to determine the locationof the nest. The robots are also able to discriminate the type of prey upon encountering thanksto their colour. Lastly, the robots use the camera to perceive if a nearby prey is already beingretrieved by another robot.

Page 58: Silvia task

4 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

2.2 Robot controller

The controller used is the same for all the robots. The architecture of the program is a finite statemachine (FSM). The scheme in Figure 2 represents the possible states, with arcs denoting thepossible transitions between states. At the beginning of an experiment, all robots are initializedin explore state.

Figure 2: Finite state machine representing the robot’s controller. Transitions between states aretriggered according to the probabilities described in Table 2. The control parameters (probabilities!, "1 and "2) can be modified by a decision mechanism which is not displayed here. The subscriptletter i designs di!erent probabilities depending on the type of prey that was first encountered.

• Explore [Description] The robot is wandering within the environment and performs a ran-dom walk. A subroutine of obstacle avoidance is triggered if a wall or another robot isencountered. [Transitions] Return to nest with a probability constant over time. Grasp aprey if it is close enough and no green colour is perceived. Grasping is conditioned by theprobabilities "1 and "2.

• Grasp [Description] The robot has detected a prey and tries to perform a physical connec-tion. [Transitions] If the grasping is successful the robot tries to retrieve the prey, otherwiseit enters in the ignore state.

• Retrieve [Description] The robot becomes green. This colour is used to prevent any otherrobot to try to take the prey already grasped. The robot heads toward the nest. [Transitions]If the robot reaches the nest, it releases the prey and enters the explore state. During theretrieval process, the robot has a probability constant over time to give up the retrieval andenter the ignore state.

• Ignore [Description] The robot gets blind to any prey. It is just performing a random walkwith collision avoidance. [Transitions] After a delay of five seconds (enough to get awayfrom a prey) the robot enters the explore state.

• Rest [Description] The robot heads back to the nest. When it is in the nest it does nothing.[Transitions] With a constant rate per time unit, the robot can decide to leave the nest byentering in the explore state.

3 Mathematical model

3.1 Modelling assumptions

The environment is circular and the robots are performing random walk when they move. Weassume that we face a memory-less process: robots find a constant rate of prey in time and it is

Page 59: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2006-027 5

independent of the time already spent exploring.To test this assumption, we ran 1000 simulations with only one robot and one prey. A specific

robot controller was designed to perform a simple random walk. The experiment was halted assoon as the robots established visual contact with the prey. The survival curve of finding timesmatches a straight line with a 95% confidence level (see Figure 3(a)).

In the same manner, we tested the time required for a robot to go back to the nest. We ran1000 other simulations with a single robot and no prey in the environment. The experiment washalted as soon as the robot was in the nest. The survival curve of times to go back to nest matchesalso a straight line with a 95% confidence level (see Figure 3(b)).

As the focus of this paper is not the study of interferences among robots, we assume thatcollision avoidance events are negligible in the mathematical model. Finally, we assume that theprobability to complete a retrieval can be modelled using a constant rate of prey retrieved perunit of time.

(a) (b)

Figure 3: (a) Survival curve of the finding times and (b) times to go back to nest. The survivalcurve analysis consists in plotting in log-linear scale the proportion of individuals that remain ina given state (looking for a prey or going back to the nest) as a function of the time elapsed sincethe beginning of this state. If the process is memory-less, then the decay of the proportion followsa straight line according to the equation : f(t) = log(e!kt) = kt. The slope k of the straight line isthe rate of changing state and 1/k is the mean time to remain in this given state. The regressionline is plotted in black and dashed lines represent a confidence interval of 95%

3.2 Mathematical description

Partial di!erential equations are devised to model the flow of robots among five main states. Weneglect the modeling of the grasp and ignore states because they occur rarely and their durationis relatively short. In order to describe how energy is gained, we model the retrieval process in twodistinct parts, one for each type of prey. In addition, we noticed that the time required to go backto nest before resting is not negligible and has to be modelled. To this extent, we introduce theback state. We end up with five main states among which flow of robots are exchanged. Figure 4shows a scheme of these flows.

We enumerate the variables of the macroscopic model in Table 1. We also clarify the meaningof all the parameters in Table 2. Notice that none of those parameters are free. They can all bemeasured in the experimental setup or decided by the experimenter, except the triplet ("1, "2, #)

Page 60: Silvia task

6 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

Figure 4: The diagram of flows that are modelled using the set of partial di!erential equations.Flows of robots are exchanged between five main states. Each arc describes a possible transitionbetween the states.

which are control parameters of the robot and are set by the experimenter or the controller of therobots.

Variable DescriptionE the number of robots in explore stateB the number of robots currently going back to the nest to restI the number of robots in rest state (or inactive robots)R1 the number of robots in retrieval state (prey of type 1)R2 the number of robots in retrieval state (prey of type 2)N1 the number of prey of type 1 in the environmentN2 the number of prey of type 2 in the environment

Table 1: The signification of each variable of the mathematical model.

The set of di!erential equations 1 is used to model the flow of robots exchanged between thedescriptive variables. We provide a detailed explanation of the first equation and let the readeruse tables 1 and 2 to figure out the details of the remaining equations. As explained in Section2.2 and depicted on Figure 4, several transitions lead robots to enter or leave the explore state.Each right-term of the di!erential equations is an amount of robots doing a specific transition.

$E

$t= !#E + !I +

2!

i=1

(!"iENi% + µiRi + &Ri)

$B

$t= +#E ! 'B

$I

$t= +'B ! !I

$Ri

$t= "iENi%! µiRi ! &Ri "i # [1, 2]

$Ni

$t= (i ! "iENi%! )iNi + &Ri "i # [1, 2]

(1)

1. First, robots can decide to rest at nest with a probability #. In average #E robots leave theexplore state and enter the back state.

Page 61: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2006-027 7

Parameter DescriptionT the total number of robots in the experiment% rate of objects per second found in the environment by a single robot' probability for a single robot to find the nestEn1 energy associated to a prey of type 1En2 energy associated to a prey of type 2Enp energy (negative) associated to 1 second spent outside the nest for one robot(1 incoming rate per second of prey of type 1(2 incoming rate per second of prey of type 2)1 probability constant over time for a prey of type 1 to disappear)2 probability constant over time for a prey of type 2 to disappearµ1 inverse of the average time required to retrieve a prey of type 1µ2 inverse of the average time required to retrieve a prey of type 2& probability to give up an ongoing retrieval# probability for a robot to return to nest (i.e. switch to rest state)! probability for a robot to leave the nest and look for prey (i.e. switch to explore state)"1 probability to take a prey of type 1 upon encounter (i.e. switch to grasp state)"2 probability to take a prey of type 2 upon encounter (i.e. switch to grasp state)

Table 2: The signification of each parameter of the experiment.

2. Conversely, robots in rest state have a probability ! to come back in explore state. Thusthere are in average !I robots entering the explore state.

3. Robots can also find a prey and decide to retrieve it. The probability to find a single objectfor a single robot being %, the average number of exploring robots that find a prey of type iis ENi%. As robots decide to retrieve the prey using the parameter "i, the average numberof robots that leave the explore state to retrieve a prey of type i is "iENi%.

4. We consider that robots have a probability of µi to achieve the retrieval of a prey of typei. Hence, there are in average µiRi robots that achieve a retrieval and come back in explorestate.

5. Lastly, during the retrieval of a prey of type i, robots have a probability & to give up andcome back in explore state. In average their are &Ri robots that give up retrieval of prey oftype i.

3.3 Stable states

We shall now study the properties of the system when it is stabilized. We assume that the agentsdo not change any more their control parameters ("1,"2, #), and that the system thus convergesto a steady state in finite time. The number of robots in the environment is constant. Robotschange their state during the experiment: the macroscopic model describes the exchange flows ofrobots between those states. Hence, the sum of robots in each state is always R, this is describedby the conservation Equation 2.

T = B + I + E +2!

i=1

Ri (2)

At steady state, the flows between states are stabilized and their derivative is zero. In thefollowing, we prove that there is only one stable state.

Page 62: Silvia task

8 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

$B

$t= 0 $% B =

#E

'$I

$t= 0 $% I =

#E

!$Ri

$t= 0 $% Ri =

"iENi%

µi + &"i # [1, 2]

$Ni

$t= 0 $% Ni =

(i + &Ri

"iE% + )i"i # [1, 2]

(3)

By substituting the expression of the variables at steady state in the conservation Equation 2,we get :

T = B + I + E +2!

i=1

Ri

T =#E

'+

#E

!+ E +

2!

i=1

"iENi%

µi + &

T ="

1 + #' + !

'!

#E +

2!

i=1

"iE%(i

"iE%µi + )iµi + )i&(4)

According to Equation 4 and under the fair assumption that the parameters are strictly pos-itive, there is a strict monotonic relationship between E and T . As T is a constant, it impliesthat for a given set of parameters defining an experiment, there is only one possible value of E atstable state and this value can be calculated using Equation 4. The remaining variables describingthe state of the system can be expressed, at stable state, in function of E as shown by Equations3. Therefore, for a given set of parameters, there is only one possible stable state of the system.Moreover, the value of the variables describing the system at stable state can be calculated.

3.4 Validation

To evaluate the quality of the model and determine to which extent we can rely on it to have agood prediction, it is mandatory to go through a validation process. This phase involves com-parison of the results obtained in simulation against those of the model for a collection of typicalexperimental situations. It is a critical stage as we will rely on the model to draw conclusions onthe characteristics of the foraging algorithm introduced later.

We define a range of reasonable values for each parameter of the experiment (except thecontrol parameters of the robots "1, "2 and #). The table 3 provides the ranges examined for eachparameter of the experiment. A configuration of the experimental setup is defined by selecting onevalue for each parameter out of the range associated to the parameter. According to the table 3,there are 2160 possible configurations that define a large collection of possible experimental setups.We denote P this collection and Ci, i # [1, 2160] one particular configuration of the experimentalsetup.

3.4.1 Prediction quality of the energy

The first validation test bears directly on the energy accumulated by a group of robots during anexperiment. The aim is to have a quantitative indication of the ability of the mathematical modelto predict the energy accumulated using a given behaviour.

We use each configuration Ci # P to parameterize an experiment of one hour long. Themathematical model is used to explore the space of the control parameters ("1,"2,#) # [0, 1]3.

Page 63: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2006-027 9

Parameter Range of values tested UnitT 1, 2, 3, 5, 10, 15 robotN1(0) 5 prey of type 1N2(0) 5 prey of type 2% 1/159.4 probability' 1/19.51 probabilityEn1 !100,!10,!1, 1, 10, 100 energyEn2 1 energyEnp !0.001,!0.01,!0.1 energy(1 1/15, 1/30, 1/60, 1/120, 1/180 prey / second(2 1/60 prey / second)1 0.002 probability)2 0.002 probabilityµ1 1/90, 1/40, 1/30, 1/60 second!1

µ2 1/60 second!1

& 0.0111 probability# control parameter probability! 1/400 probability"1 control parameter probability"2 control parameter probability

Table 3: Each parameter of the experiment is given a range of reasonable values (i.e. compatiblewith the dimensions of the experimental setup, and not fostering interferences among robots). Byassociating to each parameter one value, we define an experimental configuration. In total, thereare 2160 possible configurations.

The control space is discretized using a step of 0.05 units. The mathematical model is solvedfor each triplet and we identify the control parameters that yield the highest amount of energyaccording to the predictions of the model. Hence, for each configuration Ci we associate thepredicted optimal behaviour OBi found by an exhaustive exploration of the control space.

Now we define Esim(C, B) and Emac(C,B) the energy accumulated respectively in simulationand predicted with the mathematical model (by the collective behaviour B in an experimentparameterized with the configuration C).

We plot the energy obtained in simulation in function of the prediction of the mathematicalmodel. If the two values are identical, all the points lie on a straight line of which the slope is 1.More generally, if the model manages to predict the energy with a constant proportion of error aand a constant bias b, we have a linear relationship: Rsim = a · Rmac + b. We test the hypothesisof the linear relationship using a linear regression.

Figure 5 presents the plot of Rsim(Ci, Bi) in function of Rmac(Ci, Bi) for every Ci, OBi withi # [1, 2160]. The correlation coe"cient of the linear regression (r2 = 0.98) is highly statisticallysignificant (p-value < 0.001). The slope of the regression line is a = 0.86 and the bias is b = !11.59,which indicates that the mathematical model was able to predict the energy obtained in simulationwith a constant error of 14%.

The di!erence between the predicted energy and the one actually accumulated in simulationcan be explained partly by the non modelled collisions and also by the fact that error of themathematical model is integrated over time.

3.4.2 Prediction quality of a comparison

The second test is meant to assess the ability of the model to compare the outcome of two di!erentbehaviours. The test consists of selecting randomly two behaviours A and B from the controlspace ("1, "2,#) # [0, 1]3. We compare the accumulated energy predicted for A and B. The same

Page 64: Silvia task

10 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

0 5000 10000 15000 20000

050

0010

000

1500

020

000

predicted optimal reward

rewa

rd re

alize

d in

sim

ulat

ion

Figure 5: The energy accumulated in simulation is plotted in function of the energy predicted bythe mathematical model. The values were obtained using 2160 experimental setups, di!erentlyparameterized. The dashed line (with a slope of 1) indicates the perfect match between mathe-matical predictions and simulations. A regression is performed on the data, and displayed by thestraight line. The correlation coe"cient is r2 = 0.98, the slope of the straight line is a = 0.86 andthe bias is b = !11.59, indicating that the model underestimates the outcome of the simulationswith a constant error of 14%.

comparison is carried out using one single run of simulation for each behaviour. The question weare asking is: does the mathematical model predict correctly which collective behaviour, either Aor B, performs better.

Again, we use each configuration Ci # P to parameterize an experiment of one hour long. Foreach configuration Ci we generate 5 pairs of random behaviours (Aij , Bij), j # [1, 10]. The table 4summarizes the frequencies of all possible comparison results for the 10800 tests performed. Thetable indicates that in 85.35% of the tests, the mathematical model and the simulations agreedon the ranking of the pair of behaviours. The table is almost symetric and shows no betterperformance of the model if A superseeds B or the opposite.

ModelSimulation

R(A) < R(B) R(A) > R(B)

R(A) < R(B) 43.22% 7.64 %R(A) > R(B) 6.97 % 42.13 %

Table 4: Comparison table of predicted orders by the mathematical model with respect to simu-lation results. The notation R(A) <,> R(B) signifies that the energy accumulated in conditionsA is lower, respectively bigger than in conditions B.

Moreover, we have studied the conditions in which disagreement between the mathematicalmodel and the simulations occurs. We did so by plotting the predicted energy for behaviour Bin function of the energy predicted for behaviour A. Figure 6 shows as black circle the pairs ofbehaviours that lead to disagreement between the mathematical model and the simulations. Theregression performed on the disagreeing data returns a correlation coe"cient r2 = 0.98 and aregression slope a = 1.00. The agreeing pairs of behaviours are plotted in gray. It is clear that

Page 65: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2006-027 11

the wrong predictions of the model occur mainly if the two behaviours are supposed to yield verysimilar energy. Given that we use only one run of simulation without averaging, an error causedby the noise in simulation is much more likely to appear for those pairs of behaviours.

−20000 −10000 0 10000 20000

−200

00−1

0000

010

000

2000

0

reward predicted for behaviour A

rewa

rd p

redi

cted

for b

ehav

iour

B

Figure 6: The energies predicted for behaviours B are plotted in function of the energies predictedfor behaviours A. Two cases are plotted in superimposition: the gray circles are situations inwhich both the mathematical model and the simulation agree on which behaviour performs better.The black circles show where disagreement was found between the mathematical model and thesimulations. The dashed line delimits the possible orders, R(A) > R(B) below or converselyR(A) < R(B) above. The disagreeing cases in black are clearly aligned on the dashed line (r2 =0.98, slope a = 1.00, bias b = 7.84), indicating that the disagreements arise mainly when theexpected energy of two behaviours are very much alike.

4 E!cient multi-foraging

4.1 Decision algorithm

The decision algorithm is a piece of code plugged in the controller of the robots. It is modifyingindividual behaviour through the three control parameters "1, "2 and #.

To make decisions and modify their behaviours, robots rely on their partial perception. Theyare “aware” of the encountering of prey, other robots and time spent exploring the environment.By dividing the number of perceived objects (prey or robots) by the time of exploration, robotscan estimate a density of object per second. As the environment is finite and bounded, this densityis a function of the spatial density. Robots are also able to perceive individually the energy of aprey and retrieval time of each type of prey.

Depending on the environment, it is possible that the best behaviour of a robot is to stay inthe nest and spare energy. Imagine that new prey appear in the environment: if the robots havedecided to stay forever in the nest, they will not find out that there is better to do than resting atnest. This situation shows that the classical tradeo! between exploration and exploitation holds.We implement a solution to this problem by setting a minimal value for the time spent outside thenest, and the probabilities to take prey of either type. On one hand, this threshold will diminishe"ciency of the robots but on the other hand it will guarantee that robots keep updating theirperception and sense conveniently the environment.

Page 66: Silvia task

12 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

In the present algorithm we use a discount factor to limit the robots’ perception to a timewindow. This technique is used to let the robots forget about the past observations and behavee"ciently if the environment changes. Instead of memorizing a sequence of observations, therobot counts the encountered objects and the time spent exploring outside the nest. The discountfactor ponderates the influence of observations, giving more importance to the most recent events.Equation 5 is used to estimate the density of a type of objects in the environment (prey or robots).

Object Count(t) = Object Count(t!#t) · Discount Factor + Object Observed

Explore T ime(t) = Explore T ime(t!#t) · Discount Factor + #t

Object Density(t) = Object Count(t)/Explore T ime(t) (5)

The decision algorithm relies on an equation that permits robots to individually estimate theenergy per second, or instantaneous amount of energy EI that can be gained by the group. Inthe following we detail the steps that lead to this equation.

We start by calculating the rate of prey grasped per second by the group of robots using :

preyRate = E%2!

i=1

Ni"i

The proportion of prey of type i grasped is :

propi =E%Ni"i$2

j=1 E%Nj"j

=Ni"i$2

j=1 Nj"j

We also rely on the average time of retrieval of a prey :

retT ime =2!

i=1

1/µipropi

=$2

i=1 1/µiNi"i$2i=1 Ni"i

The previous are used to calculate the average time to grasp a prey and then retrieve it to thenest :

preyToNest = 1/preyRate + retT ime

=1

E%Ni"i+

$2i=1 1/µiNi"i$2

i=1 Ni"i

=1 + E%

$2i=1 1/µiNi"i

E%$2

i=1 Ni"i

Lastly, we find the expression of EI, the instantaneous amount of energy acquired by the groupof robots :

Page 67: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2006-027 13

EI = E · Enp +2!

i=1

Eni · propi · 1/preyToNest

= E · Enp +2!

i=1

Eni · Ni"i$2j=1 Nj"j

·E%

$2j=1 Nj"j

1 + E%$2

j=1 1/µjNj"j

= E · Enp + E%2!

i=1

EniNi"i

1 + E%$2

j=1 1/µjNj"j

(6)

Equation 6 can be used by each robot to estimate the rate of energy currently gained by thegroup. All the parameters of this equation, except %, are either control parameters, or can beestimated by the robots during the exploration of the environment. Indeed, using Equations 5,each robot can estimate the density of robots or prey of any type in the environment, which isrespectively %E, %N1 and %N2. To compute EI, robots need to know %. This parameter may beestimated for instance by measuring the time to go back to the nest but the collisions with otherrobots may diminish the quality of such an estimation. In the following, the robots are given theparameter % that characterizes the size of the environment.

Based on the previously described sensing capabilities and Equation 6 of instantaneous energy,we introduce the decision algorithm 1. In a nutshell, this algorithm estimates parameters of theexperiment using the observations of the robot. It then estimates the impact of several possibletriplet parameters (E, "1 and "2) on the rate of energy EI. Control parameters of the robot arethen updated to converge towards a behaviour that maximizes EI.

4.2 Performance

We assess the performance of the decision algorithm by carrying out a systematic comparison of theenergy accumulated in simulation with the energy obtained by the predicted optimal behaviour. Asin the validation process (see Section 3.4), we use P the collection of 2160 di!erent configurationsof experimental setup. For each configuration Ci # P , we use the mathematical model to find outthe predicted optimal behaviour OBi.

In order to not include the error of the model in the estimation of the energy accumulatedby this behaviour, we use a single run of simulation to determine the energy gain Rsim(Ci, OBi)associated to the predicted optimal behaviour. We also run a single simulated experiment withthe decision algorithm plugged in the behaviour of each individual. The control parameters areinitially set to (1, 1, 1) so that robots start by exploring the environment. The energy accumulatedin a simulated experiment with a configuration Ci and the decision algorithm plugged in the robot’scontroller is denoted Rdec(Ci).

The energy Rdec(Ci) is compared to the predicted optimal energy Rsim(Ci, OBi). As in thevalidation process, we rely on a linear regression applied to Rdec(Ci) and Rsim(Ci, OBi). Thecorrelation coe"cient (r2 = 0.98) indicates that the linear relationship hypothesis holds (p-value< 0.001). The slope of the regression line is a = 0.99 and the bias is b = !23.73, which meansthat the decision algorithm performs in average 99% as well as the predicted optimal behaviour.

4.3 Adaptivity

We focus on the adaptivity of the decision algorithm. The goal is to study how close robots getfrom the optimal behaviour in changing environments. We have used experiments split in threeperiods of 1 hour each. Two configurations Ci and Cj are randomly selected from the pool P of2160 possible experimental setups. Each period is assigned a configuration and the first and thirdperiods share the same one. Thus, during the whole experiment the setup will be parameterizedby the sequence (Ci, Cj , Ci) of configurations. The redundancy of the configuration Ci permitsto check whether robots can revert harmlessly to a previous behaviour that is, there is no memory

Page 68: Silvia task

14 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

Algorithm 1: Decision algorithm

Prepare four possible collective behaviours :Set b1 := (E + 1, 1, 1)Set b2 := (E + 1, 0, 1)Set b3 := (E + 1, 1, 0)Set b4 := (0, 0, 0)for each bi do

use bi to compute an expected EI

endFind the triplet bi that yields the highest expected EIif bi == b1 then

decrease # by learnStepincrease "1 by learnStepincrease "2 by learnStep

endif bi == b2 then

decrease # by learnStepdecrease "1 by learnStepincrease "2 by learnStep

endif bi == b3 then

decrease # by learnStepincrease "1 by learnStepdecrease "2 by learnStep

endif bi == b4 then

increase # by learnStepdecrease "1 by learnStepdecrease "2 by learnStep

end

e!ect elicited by the decision algorithm. If the number of robots is di!erent from one period toanother, then newly introduced robots have control parameters initially set to (1, 1, 1). Converselyrobots to be removed are randomly chosen from the set of present robots. In total, we compute aset of 1000 random combinations of configurations.

To describe the adaptive performance of the algorithm, we still use linear regressions andtheir associated correlation coe"cients. Linear regressions are carried out on data extracted each5 seconds during the experiments. Those data are the set of energies Rdec(Ciji) obtained insimulation and the energy Rsim(Ciji, OBiji) obtained by the predicted optimal behaviour, foreach of the 1000 simulated experiments.

Figure 8(b) presents the evolution of correlation coe"cients obtained using linear regressions.It shows that we can assume a linear relationship between the predictions of the model and therealizations of the simulation after approximately 10 minutes of experiment. Figure 8(a) showsthe evolution of the slope obtained using linear regressions.

In the very beginning of the experiment, the performance of the decision algorithm is best anddrops very quickly because robots have not yet enough observations to make sensible decisions.After 10 minutes, the performance of the decision algorithm reaches in average 90% of the energyaccumulated by the predicted optimal behaviour. The performance keeps growing slowly until thefirst change of configuration at 60 minutes, which is symbolized on the figures by vertical lines att = 3600 and t = 7200 seconds. The sudden change of parameters of the experiment decreases the

Page 69: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2006-027 15

0 5000 10000 15000 20000

−500

00

5000

1000

015

000

2000

0

Predicted optimal energy

Ener

gy a

ccum

ulat

ed in

sim

ulat

ion

Figure 7: The energy obtained using the decision algorithm is plotted in function of the energyof the predicted optimal behaviour. The values were obtained using 2160 experimental setups,di!erently parameterized. The dashed line (with a slope of 1) indicates the predicted optimalperformance. A regression is performed on the data, and it’s result is represented by the straightline. The correlation coe"cient is r2 = 0.98 and the slope of the straight line is a = 0.99, anda bias b = !23.73. It means that the decision algorithm performs in average 99% as well as thepredicted optimal behaviour.

performance. Although the group of robots adapts its behaviour, it shows a noticeable decline ofperformance down to 80% in average.

The second sudden change occurs 120 minutes after the beginning of the experiment. As thefirst and last periods have identical configurations, a memory e!ect would be noticeable. As amatter of fact, such a phenomenon was reported by [LDD04], but for a di!erent setup and decisionalgorithm. The similarities between this work and the present one are strong enough to prompt usto verify the presence of a memory e!ect that could possibly prevent the group to adapt towardsthe optimal behaviour. We see on Figure 8(a) that the performance during the third period (85%in average) is less than in the first period. This is because energy has been lost during the secondperiod. The performance increases slightly at the beginning of the third period and stabilizes, butthe loss of energy incured by the second phase can not be recovered fully. Consequently, there isa remarkable evidence of a memory e!ect, once robots adapt to a given environment, they havedi"culties to achieve the same level of performance if the environment changes suddenly.

5 Discussion

5.1 Achievements and contributions

We have designed a mathematical model of the experiment. The validation of this model hasshown that the energy accumulated by a group of robots during an experiment can be predictedwith an average error of 14%. More importantly, we have shown that the model can be usedto rank successfully two di!erent behaviours in 85% of the cases tested. We showed that the

Page 70: Silvia task

16 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

time (s)

regr

essio

n slo

pe

predicted optimal performanceperformance measuredconfiguration change

(a)

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

time (s)

r squ

ared

perfect correlationmeasured correlationconfiguration change

(b)

Figure 8: (a) The evolution of the slope of linear regressions performed each 5 seconds duringeach experiment. Each linear regression is achieved using the energies obtained by the decisionalgorithm and the ones obtained by the predicted optimal behaviour at a given time. Linearregressions are made using the data of 1000 tests. (b) The correlation coe"cient of the linearregressions indicates to which extent a linear relationship holds between energies obtained by thedecision algorithm and the predicted optimal behaviour. The dashed horizontal line indicatesthe performance of the predicted optimal behaviour. The two dotted vertical lines indicate thetimes when the configurations of the experiments were randomly changed (after 3600 and 7200seconds). The decision algorithm adapts towards the optimal behaviour (at least 90% after 10minutes), and is remarkably impacted by the sudden changes of experimental parameters duringthe second period.

errors in the remaining 15% arose only in ambiguous cases in which the energy yield by the twocompared behaviours are very similar. This new tool, previously unavailable in the literature forthe multi-foraging task, makes possible the evaluation of robots performance on an absolute scale.

An equation to calculate the average instantaneous reward gained by the group of robots hasbeen devised. It has been used to implement a decision algorithm for the robots. The tests haveshown that robots using that decision algorithm manage to accumulate 99% of the energy thatcan possibly be gained.

Related works in the literature can be categorised in multi-foraging experiments and foragingexperiments.

The study of Labella et al. [LDD04] focuses on division of labour among robots. Althoughthe authors define a concept of e"ciency, they do not have a mathematical model to quantifythe performance of the robots on an absolute scale. Our model could be tested in this frame-work to understand the similarity between the performance metric and the concept of energy weintroduced.

Ulam & Balch [UB04] have proposed a Q-learning algorithm for a multi-foraging task thatdi!ers from the one of the present document. Q-learning is used o$ine and the behaviour learnedis evaluated against some predictions using a mathematical model of optimal foraging. The ad-vantage of our method is that the decision algorithm works o$ine and finds a very good solutionin approximately 10 minutes. Moreover our task di!ers slightly as the mathematical model isnon-linear. It is known that a classical Q-learning algorithm does not perform well if the systemis non-markovian [WD92]. Therefore, a simple Q-learning would may not be suited for our task.

Lerman et al. [LJGM06] have proposed a mathematical study of a multi-foraging task. The

Page 71: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2006-027 17

authors mention the possible use of a concept of energy (they use the term reward) but do notpropose a solution. As a consequence, the robots’ behaviour proposed in their article is probablynot e"cient in our task, although several results may be transposed.

The last related work we would like to mention is the one of Liu et al. [LWS+06]. In theirarticle, the authors introduce the concept of energy in a simple foraging task and propose severalbehavioural aspects that lead the group of robots to forage more e"ciently. The authors lack avalidated mathematical model but it is clear that their conclusions apply to the present work, inparticular the implementation of a recruitment of robots could speed up the work of the robots.

5.2 Perspectives and future work

In our work, we neglected on purpose collisions among robots. Lerman et al. [LG02] emphasizedon the impact of interferences on the e"ciency of the group of robots. It is likely that robots mayperceive or estimate that decrease of performance and cope with the phenomena automatically.In the future, we would like to design a multi-foraging experiment in which collisions may happenat a high rate and impact strongly the performance of the robots to study how well the groupcould adapt it’s behaviour.

It may be possible to have robots foraging e"ciently without the knowledge of the % parameter,although that would probably degrade the performance of the robots. We will work in this directionto make the behaviour of the robots totally free of any a priori knowledge of the environment.

Our decision algorithm may be subject to a memory e!ect, similar to the one reported byLabella et al. [LDD04]. We plan to investigate a new method of updating perception that mayrestrict more strongly the long term memory of the robots, and therefore possibly reduce memorye!ects that decreases the adaptivity of the robots. Another possibility is that robots need a longtime to adapt to a sudden change. We will check this hypothesis and try to accelerate adaptivityif the problem is indeed a matter of speed.

In the same way as Labella et al. [LDD04] and Liu et al. [LWS+06], we observed a division oflabour among robots. We didn’t report these observations in the present document, but furtherinvestigations may provide interesting information on the characteristics of the behaviour of thegroup of robots.

The recruitment of foraging robots could improve the performance of the robots. We did notimplement this feature in the behaviour of the robots for the present study but it seems to be apromising direction to enhance the foraging capabilities.

Lastly, to validate our approach and assess the realism of our simulations we plan to carry outa number of robotic experiments.

Acknowledgments. This research has been supported by COMP2SYS, a Marie Curie Early StageResearch Training Site funded by the European Community’s Sixth Framework Programme under contractnumber MEST-CT-2004-505079. The information provided is the sole responsibility of the authors anddoes not reflect the opinion of the sponsors. The European Community is not responsible for any use thatmight be made of data appearing in this publication.

The authors would like to thank Thomas H. Labella and Mauro Birattari for fruitful discussions, and

Shervin Nouyan for careful readings.

References

[Bal99] Tucker Balch. Reward and diversity in multirobot foraging. Workshop on AgentsLearning About and with Other Agents (IJCAI-99), Stockholm, Sweden, 1999.

[Cha76] Eric L. Charnov. Optimal foraging, the marginal value theorem. Theoretical Popula-tion Biology, 9(2):129–136, 1976.

[DSahin04] M. Dorigo and E. Sahin. Swarm robotics – special issue editorial. Auton. Robots,17(2–3):111–113, 2004.

Page 72: Silvia task

18 IRIDIA – Technical Report Series: TR/IRIDIA/2006-027

[DTG+05] Marco Dorigo, Elio Tuci, Roderich Groß, Vito Trianni, Thomas Halva Labella, ShervinNouyan, Christos Ampatzis, Jean-Louis Deneubourg, Gianluca Baldassarre, StefanoNolfi, Francesco Mondada, Dario Floreano, and Luca Maria Gambardella. The swarm-bots project. Lecture Notes in Computer Science, 3342:31–44, 2005.

[IMBG01] Auke Jan Ijspeert, Alcherio Martinoli, Aude Billard, and Luca Maria Gambardella.Collaboration through the exploitation of local interactions in autonomous collectiverobotics: The stick pulling experiment. Autonomous Robots, 11(2):149–171, 2001.

[KAKG02] S. Kazadi, A. Abdul-Khaliq, and R. Goodman. On the convergence of puck clusteringsystems. Robotics and Autonomous Systems, 38(2):93–117, 2002.

[LDD04] Thomas H. Labella, Marco Dorigo, and Jean-Louis Deneubourg. E"ciency and taskallocation in prey retrieval. Lecture Notes in Computer Science, 3141:274–289, 2004.

[LG02] Kristina Lerman and Aram Galstyan. Mathematical model of foraging in a group ofrobots: E!ect of interference. Auton. Robots, 13(2):127–141, 2002.

[LJGM06] Kristina Lerman, Chris Jones, Aram Galstyan, and Maja J. Mataric. Analysis ofdynamic task allocation in multi-robot systems. The International Journal of RoboticsResearch, 25(3):225–241, 2006.

[LWS+06] W. Liu, A. Winfield, J. Sa, J. Chen, and L. Dou. Strategies for energy optimisation ina swarm of foraging robots. In Lecture Notes in Computer Science : Swarm Robotics.Springer, Berlin / Heidelberg, 2006.

[ME03] Alcherio Martinoli and Kjerstin Easton. Modeling swarm robotic systems. SpringerTracts in Advanced Robotics, 5:297–306, 2003.

[SS97] Ken Sugawara and Masaki Sano. Cooperative acceleration of task performance: For-aging behavior of interacting multi-robots system. Physica D: Nonlinear Phenomena,100(3-4):343–354, 1997.

[SSYA98] K. Sugawara, M. Sano, I. Yoshihara, and K. Abe. Cooperative behavior of interactingrobots. Artificial Life and Robotics, 2:62–67, 1998.

[UB04] Patrick Ulam and Tucker Balch. Using optimal foraging models to evaluate learnedrobotic foraging behavior. Adaptive Behavior - Animals, Animats, Software Agents,Robots, Adaptive Systems, 12(3-4):213–222, 2004.

[WD92] Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8:279–292, 1992.

Page 73: Silvia task

http://adb.sagepub.comAdaptive Behavior

DOI: 10.1177/1059712307082088 2007; 15; 289 Adaptive Behavior

Wenguo Liu, Alan F. T. Winfield, Jin Sa, Jie Chen and Lihua Dou Towards Energy Optimization: Emergent Task Allocation in a Swarm of Foraging Robots

http://adb.sagepub.com/cgi/content/abstract/15/3/289 The online version of this article can be found at:

Published by:

http://www.sagepublications.com

On behalf of:

International Society of Adaptive Behavior

can be found at:Adaptive Behavior Additional services and information for

http://adb.sagepub.com/cgi/alerts Email Alerts:

http://adb.sagepub.com/subscriptions Subscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://adb.sagepub.com/cgi/content/refs/15/3/289SAGE Journals Online and HighWire Press platforms):

(this article cites 8 articles hosted on the Citations

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 74: Silvia task

289

Towards Energy Optimization: Emergent Task

Allocation in a Swarm of Foraging Robots

Wenguo Liu1, Alan F. T. Winfield1, Jin Sa1, Jie Chen2, Lihua Dou2

1Bristol Robotics Lab, University of the West of England, Bristol, UK2Intellectual Information Technology Lab, Beijing Institute of Technology, China

This article presents a simple adaptation mechanism to automatically adjust the ratio of foragers toresters (division of labor) in a swarm of foraging robots and hence maximize the net energy income tothe swarm. Three adaptation rules are introduced based on local sensing and communications. Indi-vidual robots use internal cues (successful food retrieval), environmental cues (collisions with team-mates while searching for food) and social cues (team-mate success in food retrieval) to dynamicallyvary the time spent foraging or resting. Simulation results show that the swarm demonstrates success-ful adaptive emergent division of labor and robustness to environmental change (in food source den-sity), and we observe that robots need to cooperate more when food is scarce. Furthermore, theadaptation mechanism is able to guide the swarm towards energy optimization despite the limitedsensing and communication abilities of the individual robots and the simple social interaction rules.The swarm also exhibits the capacity to collectively perceive environmental changes; a capacity thatcan only be observed at a group level and cannot be deduced from individual robots.

Keywords swarm foraging · swarm robotics · task allocation · emergent division of labor

1 Introduction

Inspired by the swarm intelligence observed in socialinsects, robotic swarms are fully distributed systems inwhich overall system tasks are typically achievedthrough self-organization or emergence rather thandirect control (Bonabeau, Dorigo, & ThJraulaz, 1999).In swarm robotics a number of relatively simplerobots, each with limited sensing, actuation and cogni-tion, work together to collectively accomplish a task.Such tasks may be biologically plausible, such as clus-ter sorting (Holland & Melhuish, 1999) or cooperativestick-pulling (Martinoli, 1999), or they may have noparallel in nature, such as coherent wireless network-

ing (Nembrini, Winfield, & Melhuish, 2002). Forag-ing, however, is a compelling example of a swarmbehavior that can be transferred from natural to artifi-cial systems because of the one-to-one correspondencebetween ant and robot and between food-item andenergy units. Foraging can, in principle, be undertakenby a single robot given enough time, but a swarm ofrobots working together should be able to complete thetask more quickly and effectively. Foraging is, there-fore, an example of a swarm behavior in which it is notthe task itself, but the way the task is (self-)organized,that is the desired emergent property of the swarm.

It is a characteristic of foraging that the efficiencyof the swarm will not increase monotonically with

Copyright © 2007 International Society for Adaptive Behavior(2007), Vol 15(3): 289–305.DOI: 10.1177/1059712307082088Figures 2–8 appear in color online: http://adb.sagepub.com

Correspondence to: Wenguo Liu, Bristol Robotics Lab, Faculty of Computing, Engineering and Mathematical Sciences, University of the West of England, Bristol, BS16 1QY UK.E-mail: [email protected]

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 75: Silvia task

290 Adaptive Behavior 15(3)

swarm size because of the negative impact of interfer-ence, for instance due to robots competing for space(overcrowding) either while searching or when con-verging on the nest. Research has focused on how tocarefully design the basic behavior or communicationprotocols for the individual robots in order to mini-mize the interference between robots. qstergaard,Sukhatme, and Matariƒ (2001) applied a bucket bri-gade approach to minimize the interference for robotsnear the home region, where competition for spaceoccurs much more frequently. In their experimenteach robot is allocated a specific working area anddelivery region so that they are less likely to collidewith each other in a relatively crowded area. Lerman(2002) developed a macroscopic probabilistic modelto quantitatively analyze the effect of swarm size andinterference among robots on the overall performancebased on qstergaard’s experiment, and some research-ers point out that there is a critical swarm size in orderto maximize the efficiency of the group for a giventask (Rosenfeld, Kaminka, & Kraus, 2005).

Other researchers have applied a threshold-basedapproach inspired from biological systems, as firstdescribed by ThJraulaz, Bonabeau, and Deneubourg(1998) in investigating the division of labor in socialinsects, to allocate robots to each task: resting or for-aging, in order to achieve the task efficiently. Kriegerand Billeter (2000) implement a swarm of up to 12real robots engaged in a foraging task. In their experi-ments each robot is characterized with a different ran-domly chosen activation-threshold, that is, the energylevel of the nest below which a given robot is trig-gered to go and collect food-items, in order to regulatethe activity of the team. Labella, Dorigo, and Deneu-bourg (2006) introduce a simple adaptive mechanismto change the ratio of foragers to resters to improvethe performance of the system where the probabilityof leaving home for one robot is adjusted dynamicallybased on successful retrieval of food. They rewardsuccessful food retrieval and punish failure in order tovary the probability of leaving home. Self-organizeddivision of labor between resting and foraging isobserved with this simple adaptation mechanism. How-ever, a disadvantage of this approach is the absence ofknowledge about the other robots in the swarm.

In contrast with the purely threshold-basedapproaches, Jones and Matariƒ (2003) describe anadaptive method for division of labor between a col-lection of red or green pucks, in which robots do not

communicate but observe each other and maintain alimited memory of observed activities of other robotsand tasks which need to be performed. Guerrero andOliver (2003) present an auction-like task allocationmodel, partially inspired by auction and threshold-based methods, to try to determine the optimalnumber of robots needed for foraging, however, thedemands of communication between robots during theauction process constrains the scalability of theirmethod

This article builds upon previous work in taskallocation or division of labor in a number of ways.Firstly, our overall goal is that the swarm maximizesits net energy income. Secondly, we investigate aricher set of adaptation rules, or cues, for individualrobots: internal cues (successful food retrieval), envi-ronmental cues (collisions with team-mates whilesearching for food) and social cues (team-mate suc-cess in food retrieval). Social cues are triggered bypheromone-like local communication between robots.Thirdly, we evaluate a number of control strategiesbased upon different combinations of these internal,environmental and social cues, in order to discover therelative merit of the cues in optimizing the net energyincome to the swarm. Our foraging swarm makes useof local sensing and communication only, but theoverall swarm exhibits properties of emergence andadaptive self-organization; that is adaptation to envi-ronmental changes (in food density). The approachpresented in this article thus meets the criteria forswarm robotics articulated by Ôahin (2005) and Beni(2005).

This article proceeds as follows. In Section 2 weintroduce the basic adaptation mechanism for the indi-vidual robots in the swarm. In Section 3, descriptionsof the foraging task and the experimental environmentare given. Section 4 presents the experimental results,in which we compare the performance of the systemwith different adaptation rules. We conclude the arti-cle in Section 5.

2 Adaptation Mechanism

2.1 Problem Description

Consider the swarm foraging scenario: there are anumber of food-items randomly scattered in the arenaand as food is collected more will, over time, “grow”to replenish the supply. A swarm of robots are search-

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 76: Silvia task

Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots 291

ing and retrieving food-items back to the “nest”. Eachfood-item collected will deliver an amount of energyto the swarm but the activity of foraging will consumea certain amount of energy at the same time. The goalof the swarm is to forage as much food as possibleover time in order to survive. Due to the limited avail-ability of food, however, robots have to switchbetween foraging and resting in order to maximize thenet energy income to the swarm. The challenge is toachieve this with no centralized control and robotswith only minimal local sensing and communication.Assume that:

• each robot will consume energy at A units per sec-ond while searching or retrieving and at B unitsper second while resting, where A > B and energyconsumption depends on the actuators and sen-sors used in different states;

• each food-item collected by a robot will provideC units of energy to the swarm;

• average retrieval time (the time spent on one suc-cessful search and retrieval cycle), denoted by t, isa function of the number of foraging robots,denoted by x, and the density of the food, denotedby !, in the environment, say:

t = f(x,!) (1)

Let N be the size of swarm, Econsumed be the energyconsumed and Eretrieval be the energy collected per sec-ond for the swarm, then we have

(2)

The average energy income per second for the swarmis

Eaverage = Eretrieval – Econsumed

(3)

Equation 3 shows that in order to maximize energyincome for the swarm we need to either increase the

number of foragers x or decrease the average retrievaltime f(x,!). The more robots engaged in foraging,however, the more likely are the robots to compete forthe limited resources of food and physical space. Thus,increasing the number of foragers results in more inter-ference between robots and robots take longer to findand retrieve a food-item, that is, the average retrievaltime for the swarm f(x,!) will increase with the numberof foragers x increasing when the food density !remains constant. Therefore, for a given ! there shouldbe an optimal value, say X*, for x so that Eaverage inEquation 3 is maximized. It is clearly the case that X*will change if ! changes. However, the function f(x,!)will be quite complex and hard to model because ofthe complexity of the interactions between robots;although it may ultimately be possible to develop adetailed mathematical model in order to find an opti-mal value of X*, using for example the probabilisticapproach developed by Martinoli and Easton (2004).It seems more practical to adopt a bottom-up designprocess (a typical characteristic of the swarm roboticsmethodology), resulting in a swarm that is able todynamically adapt the ratio of foragers to restersthrough the interaction between robots and betweenrobots and the environment.

2.2 Robot Controller

We first need a controller design for the robot. Figure 1represents the control program for each robot in theswarm. Note that in order to keep the diagram clear,with the exception of state resting, each state will moveto state avoidance – not shown in Figure 1 – wheneverthe proximity sensors are triggered. The behaviors forthe foraging task are:

leavinghome: robot exits the nest region and resumesits search;

randomwalk: robot moves forward and at randomintervals turns left or right through a random arcusing the camera to search for food;

movetofood: if food is sensed by the camera, robotadjusts its heading and moves towards the food;

grabfood: if the food is close enough, triggered bysensors, robot closes its gripper and grabs thefood;

scanarena: because of interference between robotssometimes a robot will lose sight of the targetfood-item while moving towards it; if this hap-

Econsumed Ax B N x–( ) /s( )+=

Eretrieval Cx t⁄ Cxf x !,( )--------------- /s( )= =

Cf x !,( )--------------- A B–( )–! "

# $

% &

x BN–=

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 77: Silvia task

292 Adaptive Behavior 15(3)

pens the robot will scan the immediate area byturning a random angle to find the lost food. Ifsuccessful, it will move to the food (movetofood)again, if not, then randomwalk;

movetohome: robot moves towards the home locationwith the food;

deposit: robot unloads the food-item in the nest;resting: robot rests at home;homing: if searching time is up and no food has been

grabbed, then return home;avoidance: robot avoids obstacles, walls and other

robots whenever its proximity sensors are trig-gered; after completing a successful avoidancebehavior the robot returns to its previous state.

We now introduce two internal clock counters for therobots to regulate their activities (foraging or resting).Ts is used to count the time the robot spends search-ing, and Tr to count the time resting in the nest. Asshown in Figure 1, the transition between states ran-domwalk, scanarena, movetofood and state homing istriggered when searching time Ts reaches its thresholdThs, that is, Ts " Ths; such a transition will reduce thenumber of foragers, which in turn minimizes the inter-ference due to overcrowding, thus reducing the aver-age retrieval time. The transition between state restingand state leavinghome, which is triggered when therobot has rested for long enough, that is, Tr " Thr, willdrive the robot back to work to collect more food forthe colony, which means increasing the number of for-agers in the swarm. The efficiency of the swarm mightbe improved if robots are able to autonomously adjusttheir searching time threshold Ths and resting timethreshold Thr.

2.3 Adaptation Rules

In swarm foraging each individual robot has only lim-ited local sensing and communications. A robot can-not get global information, for instance how much foodis available or how many other robots are engaged inforaging. How, then, can robots adjust the two timethreshold parameters (Ths and Thr) in order to improvegroup efficiency, based on local information? Ourinspiration comes from the widely observed phenome-non in nature, such as in schools of fish, flocks ofbirds, and so forth. The group behavior emerges fromthe interactions of individuals by essentially followingthe rule “I do what you do”, “I go where you go”(Camazine et al., 2001). If a robot follows the rule “Iforage when you forage”, “I rest when you rest”, itmay be possible to change the ratio of the foragers andresters in the swarm and improve the efficiency of thesystem. In order to achieve this, we introduce threerules to change the two parameters, Ths and Thr, basedon (i) environmental cues, (ii) internal cues, and (iii)social cues, explained as follows:

• Environmental cues. For a robot that is searchingfor food, if it collides with other robots it willreduce Ths and increase Thr because “there arealready enough foragers in the arena, I’d bettergo back to the nest sooner so I don’t get in theothers’ way”.

• Internal cues. After a successful food retrievalcycle, a robot will reduce Thr because “there maybe more food, I’d better go back to work as soonas possible”. Alternatively if a robot fails to findfood, indicated by searching time is up, the robot

Figure 1 State transition diagram of foraging task.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 78: Silvia task

Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots 293

will increase Thr since “there seems to be littlefood available, I’d better rest for longer”.

For social cues we introduce a pheromone-like mech-anism into the swarm. Two types of virtual pherom-one, indicating either success or failure of foodretrieval, are deposited by the robots returning to thenest (implemented here by a shared whiteboard thateach robot can both write to and read from). Thesepheromone levels will gradually decay with timeelapsing. All robots resting at home will be able tosense the change in pheromone levels and adjust theirThs and Thr based on the following rules:

• Social cues. If a robot returning home deposits asuccessful retrieval pheromone thus increasingthe level of that pheromone, then the robots rest-ing at home will reduce Thr and increase Ths

because “somebody else has found food, theremay be more so I’d better get back to worksooner”. Alternatively, if a resting robot perceivesan increase in the failure retrieval pheromone itwill increase Thr and reduce Ths because “some-body else failed to find food, there may be a foodshortage so I’d better stay in the nest for longer”.

Table 1 summarizes the cues and conditions forincreasing or decreasing time thresholds. With thethree adaptation rules described above each robot inthe swarm will, from one time step to the next, adapt

its own time thresholds and as shown in thefollowing equations, where i indicates the ID for eachrobot.

(4)

(5)

The second entry on the right-hand side of the equa-tions above represents the contribution provided by theenvironmental cues, where Ci(t) counts the collisionswhile searching, and #1 and #2 are adjustment factors tomoderate the contribution of the environmental cues.The third and fourth entries on the right-hand side takethe social cues into account, in which and represent the retrieval information (success or failure)from team-mates through the pheromone-like mecha-nism. The contribution from social cues is moderatedby altering the adjustment factors $1, $2, %1, %2. Thefinal entry in Equation 5 denotes the contribution frominternal cues, where Ri(t) stores up-to-date informa-tion on the robot’s own food retrieval success or fail-ure, and & is the adjustment factor for internal cues.

Note that the internal cues do not contribute to thechange of in Equation 4 because it is difficult todetermine whether the robot should increase ordecrease solely based on its own recent retrievalsuccess or failure. Consider that the robot hasretrieved a food-item to the nest; it is not necessary toincrease its own since it has been successful withthis searching time threshold, conversely if it reducesits it may have less chance of finding food duringthe decreased searching time threshold yet the suc-cessful retrieval suggests there may be more foodavailable to collect. Now consider the situation inwhich the robot has failed to find food; increasing ordecreasing will possibly lead the robot to eitherwaste more energy searching in a poor food sourceenvironment or have less chance to retrieve a food-item in a rich food source environment because thereis not enough time to find and collect a food-item. Toavoid this potentially ambiguous impact on perform-ance, the robot therefore takes no reward or punish-

Table 1 Mapping from cues to time threshold increase/decrease.

increase decrease

Ths ! " #

Thr $ " # % !

internal cues: $ failure retrieval

% success retrieval

environmen-tal cues:

" collide with other robots while searching

social cues: # failure retrieval by teammates

! success retrieval by teammates

!

"

#

!

"

#

Thsi Thr

i

Thsi t 1+( ) Ths

i t( ) #1Ci t( )–=

$1Psi t( ) %1Pf

i t( )–+

Thri t 1+( ) Thr

i t( ) #2Ci t( )+=

$2– Psi t( ) %2Pf

i t( ) &Ri t( )–+

Psi t( ) Pf

i t( )

Thsi

Thsi

Thsi

Thsi

Thsi

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 79: Silvia task

294 Adaptive Behavior 15(3)

ment from internal cues to adjust its searching timethreshold .

Ci(t), Ri(t), and in Equations 4 and 5are defined as follows

(6)

(7)

(8)

(9)

where SPs and SPf in Equations 8 and 9 are used tosimulate two kinds of virtual pheromone, related toretrieval success or failure information respectively.This information can only be accessed by robotsresting in the nest. As shown in Equation 10, attenu-ation factors 's and 'f are introduced here to simulategradually decaying rather than instantly disappearingsocial cues, somewhat akin to ants leaving a decay-ing pheromone trail while foraging. Such a mecha-nism could be readily implemented in a real robotimplementation:

(10)

As the virtual pheromones are only accessible to therobots in the nest, two categories of robots will beaffected by the social cues. One is those already restingin the nest, the other is those ready to move to stateresting from states homing or deposit; the formershould be able to “monitor” the change of these twopheromones when resting and then adjust its time

threshold parameters, while the latter will benefit fromthe gradually decaying pheromone deployed by team-mates. These two situations for updating and

are shown in Equations 8 and 9.Clearly the values of and for each robot

will change over time. However, if becomes toosmall the robot may not be able to complete foodretrieval (because of the time taken to move towardsthe food-item and grab it after the food-item has beenseen by the robot’s camera), resulting in unexpectedbehavior of the robot. On the other hand, if becomes too large the robot will consume much moreenergy than it can collect when the food source is rela-tively poor. Therefore, and are clamped tokeep them within limits:

The selection of values for adjustment and attenuationfactors will depend on the environment and theparameters of the robot behaviors. With careful selec-tion of those factors each robot can adapt its timethreshold parameters through the interactions betweenrobots and environment, resulting in task switchingbetween foraging and resting. At the swarm level, theratio of foragers to resters will dynamically adjustaccording to the given food source environment, thus,we contend, the efficiency of the swarm can be opti-mized on the basis of the analysis in Section 2.1.

3 Experimental Setup

3.1 Simulation Environment

To validate the hypothesis presented in the previoussection, we tested our swarm foraging adaptationscheme using the sensor-based simulation tools Player/Stage (Gerkey, Vaughan, & Howard, 2003). Player is aserver that connects robots, sensors, and control pro-grams over a network. Stage simulates a population ofmobile robots, sensors and objects in a two-dimen-sional bitmapped environment. The robots used in thesimulation are realistic models of the wireless net-worked Linuxbots in the Bristol Robotics Laboratory(Winfield & Holland, 2000). An advantage of thisapproach is that the Player control programs can betransferred directly from simulation to the real robots.

Figure 2 is a snapshot of the simulation. Betweentwo and ten robots work in an 8 m ( 8 m octagonally

Thsi

Psi t( ) Pf

i t( )

C ti ( ) =)*

+,

1

0

state state randomwalk avoidanceotherwise

R ti ( ) =)

- )1

1

state state

state state

deposit restinghoming ressting

0 otherwise

*+.

,.

P t SP tis s

e

e

state sta( ) ( )= !0 not in statresting

deposit tte

resting" i

N i iR t R t= >

#$%

&% 1 0{ ( ) | ( ) } in statresting

P t SP tif f

not in state

state state ( ) ( )= )0 resting

homing reestingresting / i

N i iR t R t= <

*+.

,. 1 0{ ( ) | ( ) } in state

SPs t 1+( ) SPs t( ) 's –=

Ri t( ) Ri t( ) 0>! "

% &

i 1=

N

!+

SPf t 1+( ) SPf t( ) 'f –=

Ri t( ) Ri t( ) 0>! "

% &

i 1=

N

!+

Psi t( )

Pfi t( )

Thsi Thr

i

Thsi

Thsi

Thsi Thr

i

Thsi Ths-min Ths-max,( ) Thr

i 0 Thr-max,( )0,0

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 80: Silvia task

Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots 295

shaped arena. The nest area is indicated with a green(gray) color, with one homing light source located atpoint A to indicate the direction of the nest. Eachrobot is size 0.26 m ( 0.26 m, the same as the realrobots (Linuxbots) in our laboratory, and is equippedwith three light intensity sensors, three infra-red prox-imity sensors, one camera, one color sensor and onegripper. Thus the robot can sense food at a distanceusing the camera then grab the food using its gripper;the robot also has the ability to find its way back to thenest using the three front mounted light intensity sen-sors and to know whether it is at home or not with thebottom mounted color sensor. The control programsfor each robot are identical, as shown in Figure 1.

3.2 Selection of Parameters

At the start of the simulation, all robots are in stateresting with the same time threshold Ths(= Ths-max)and Thr(= 0). In order to maintain the food density !at a reasonably constant level over time, a new food-item is placed randomly in the searching arena withprobability Pnew, the growth rate, each second. Bychanging the growth rate we obtain the desired foodsource density. Collected food deposited in the homearea will be removed to prevent robots from retrievingthe food that has already been collected. In each time

step of the simulation, a robot will consume an amountof energy varying with its state since the robot uses dif-ferent sensors and actuators in different states. Forexample, a robot will consume more energy when car-rying food back to the nest than when wandering in thesearch area because the gripper is used in the formerstate. We estimate these values by allocating each sen-

Figure 2 Screen shot of simulation, robots move at speed 0.15 m/s. (A) Eight robots are engaged in foraging. (B) Closeup view of robot, with a food-item in gripper; IR sensors range 0.2 m, the field of view for the camera is 60° and the viewdistance is 2 m.

Table 2 Energy consumed in each state by the robot.

State Energy consumed (units/s)

leavinghome 6

randomwalk 8

movetofood 8

grabfood 12

scanarena 8

movetohome 12

deposit 12

resting 1

homing 6

avoidance 6 or 9

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 81: Silvia task

296 Adaptive Behavior 15(3)

sor and actuator a relatively arbitrary value which rep-resents the energy consumed by the component, thenthese values are summed in accordance with the spe-cific sensors and actuators used in the state. Table 2shows the resulting energy consumed per second foreach state. Note that the energy consumed in stateavoidance also varies depending on whether therobot is carrying a food-item. Moreover, the robot willconsume a small amount of energy even when restingat home, currently 1 unit/s. A successful food-itemretrieval will deliver 2000 units of energy to the swarm.

Before implementing the algorithm we first needto choose the parameters in Equations 4 and 5.Clearly, the average time for a robot to move to afood-item and grab it depends on the forward speed ofthe robot v and the detection range of the camera Dr, arough estimate is Dr/v 1 14 s. Based on this estimatewe then set Ths-min to 20 s. Ths-max is set to prevent therobot wandering outside the nest for too long, whichcould result in a larger negative contribution to theenergy of the swarm even if the robot does eventu-ally collect a food-item. For robots returning a food-item, the average time spent moving back to the nestcan be estimated by simply considering the approxi-mate radius of the arena R and robot forward speed v,that is, R/2v 1 27 s. Taking into account the relativeenergy consumption in each state (see Table 2), therobot will consume 27 * 12 = 324 units of energyduring this period, thus, in order to make a positivecontribution to the energy of the swarm the maxi-mum time it can spend on searching is approximately(2000 – 324)/8 1 210 s. Hence we choose Ths-max tobe 200 s. A similar set of considerations led us to setThr-max to 2000 s.

However, there are no obvious similar guidelinesfor selection of adjustment factors. We thereforechoose these values on a trial and error basis. Gener-ally, a large change in the two time threshold parame-ters will potentially cause oscillation or stabilizationproblems for the swarm while a small change couldlead to a slow adaptation process. We first choose & as20, then %2 is set to 40 and %1 to 20 to balance the rela-tive contribution of social and internal cues. $1 and $2

are then set at 10 to take the success retrieval fromteam-mates into account. Consequently, we set #1 and#2 to 5, based on the same considerations. Table 3summarizes all of these parameters. Note that 's and 'f

are arbitrarily chosen here.

4 Experimental Results

The metric of performance of the swarm is defined asthe energy efficiency given by Equation 11.

(11)

In order to investigate whether and how our foragingadaptation mechanism can improve the energy effi-ciency of the swarm three types of experiments weredesigned, by changing the size of swarm and the foodsource density. Additionally, to determine how eachcue affects the energy efficiency, four strategies, eachwith a different combination of cues, were designedfor each type of experiment. Table 4 shows the cueconfiguration of each strategy. Strategy S1, in whichno cues are taken into account, provided us with ahomogeneous (fixed time threshold parameters)swarm as a benchmark. We then added cues one byone from strategy S2 to S4, the rationale for whichbeing that any strategy must be able to adjust the twotime threshold parameters in a bi-directional manner,

Table 3 Parameter selection for adaptation algorithm.

Ths–min Ths–max Thr–max #1 #2 $1 $2 %1 %2 & 's 'f

20 200 2000 5 5 10 10 20 40 20 0.1 0.1

Table 4 The four strategies: cue combinations.

With internal cues

With social cues

With environmental

cues

S1 ! ! !

S2 " ! !

S3 " " !

S4 " " "

efficiency net energy income to swarmenergy available from environment------------------------------------------------------------------------------------=

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 82: Silvia task

Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots 297

namely both increasing and decreasing. Not all com-binations of cues met this requirement, for instance, astrategy with a combination of internal and environ-mental cues does not. For each strategy the value ofadjustment and attenuation factors in Equations 4, 5and 10 were either the values shown in Table 3 or setto zero depending on the combination of cues. Forexample, for the swarm using strategy S1, all of thesevalues were zero; for the swarm using S2 only & wasset to the values in Table 3 and others remained zero.

4.1 Fixed Food Density, Variable Swarm Size

To explore the optimal number of foragers, we ranexperiments for the swarm with 2, 4, 6, 8 and 10robots respectively. The food source density was thesame for these experiments and Pnew was set to 0.03.For each swarm we applied four strategies and eachstrategy was run 10 times in simulation. The simula-tions each lasted for 10,000 s. We monitored the netenergy and the number of foragers during the wholesimulation, and we then averaged the data from the 10runs for each experiment. The results are given inTable 5. With equal Pnew the total amount of food“growing” in the search arena over time was almostthe same (close to 300 items) in all experiments. Wecompare the food collection rate (collected/produced)in Figure 3. It shows that the collection rate fell

slightly from S1 to S4 for the swarm with the samenumber of robots. For the swarm with 4, 6 and 8robots, more than 98% of food was collected no mat-ter which strategy was used. However, the swarm with2 robots only collected 92% of the food; the same sit-uation occurred for swarm size 10 with strategies S3

and S4, where 6–7% of food was left in the searchingarena. Checking the number of foragers for theseexperiments, we found they were all less than 3 robotson average. All robots were foraging all of the time inthe swarm with 2 robots since there was enough foodavailable for retrieval and, for the swarm sized 10 withstrategies S3 and S4 the average numbers of foragerswere 2.95 and 2.8, respectively. In other words, morethan 2 robots needed to be engaged in foraging inorder to collect all the food; meanwhile, the morerobots resting in the nest, the more energy the swarmcould save. Therefore, for the given food source den-sity Pnew = 0.03, the optimal average number of forag-ers over time, in order to maximize the energy incomefor the swarm, must be between 2 and 4.

We then compared the energy efficiency of theswarm with different strategies. As shown in Figure 4,the efficiency of the four strategies in the swarm with2 robots was nearly the same since all robots wereengaged in foraging. However, for swarms with morethan 2 robots, the swarm with strategy S4 could alwaysobtain the highest energy efficiency, while the swarmwithout any adaptation rules (S1) had the lowestenergy efficiency.

The energy efficiency decreased with increasingswarm size for all strategies, but the efficiency gapbetween strategies S3 and S2 became bigger with

Figure 3 Comparison of food collected rate (food pro-duced/collected) for different strategies in the swarm with2, 4, 6, 8, and 10 robots.

S 1 S 2 S 3 S 4

Strategies

92

93

94

95

96

97

98

99

100

Food

colle

cted

rate

s(%

)

2 robot4 robot6 robot

8 robot10 robot

Figure 4 The relationship between swarm size and effi-ciency for different strategies.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 83: Silvia task

298 Adaptive Behavior 15(3)

increasing swarm size. Since strategies S3 and S4 bothused social cues but S2 did not, the more robots in theswarm the more often the robots will interact witheach other, and hence the more information about theenvironment (food source density) the robots canobtain. This helps a robot switch tasks between forag-ing and resting more effectively, and thus the effi-ciency of the swarm can be improved. We argue thatthe efficiency gap between S2 and S3 will increase with

the increasing swarm size. Considering the averagenumber of foragers in all experiments (swarm sizegreater than 2); we can see the swarm with strategy S4

had the lowest average number of foragers, resultingin more energy saved since most of the food producedwas collected. The values varied from 2.80 to 3.95,close to the optimal number of foragers for the givenfood source density as previously indicated. Thus wecan conclude the proposed adaptation mechanisms

Table 5 Average results of 10 runs for the swarm with different size and different strategies. Each simulation lasts for10,000 seconds. The food density remains the same during each simulation (Pnew = 0.03). Efficiency is calculatedaccording to Equation 11.

Size StrategiesFood

Average foragers Net energy EfficiencyProduced Collected

2 S1 295.8 278.4 2 443634 75.0%

S2 306.5 283.6 1.95 452045 73.7%

S3 298.8 276.1 1.97 438392 73.4%

S4 300.7 277.7 1.97 443186 73.7%

4 S1 298.4 294.7 4 267806 44.9%

S2 301.3 297.9 3.57 300569 49.9%

S3 295.0 290.6 3.52 290855 49.3%

S4 295.0 290.1 3.25 306355 51.9%

6 S1 299.9 298.0 6 124218 20.8%

S2 307.0 304.2 4.78 215932 35.2%

S3 299.4 293.4 3.11 300533 50.2%

S4 295.7 286.3 2.87 305278 51.6%

8 S1 297.8 295.8 8 –23400 –3.9%

S2 302.1 299.1 5.65 141720 23.5%

S3 296.7 292.5 4.32 202260 34.1%

S4 296.3 290.4 3.95 227020 38.3%

10 S1 300.7 298.7 10 –167322 –27.8%

S2 298.5 293.7 6.12 74761 12.5%

S3 291.1 273.7 2.95 236173 40.6%

S4 291.1 270.5 2.80 240932 41.3%

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 84: Silvia task

Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots 299

can not only improve the performance of the systemsignificantly but also guide the swarm towards energyoptimization for a given food source density environ-ment.

A more interesting result is seen in Figure 5, inwhich the instantaneous number of foragers over time

are plotted. The number of foragers in all experimentswith strategy S3 and S4 kept oscillating with time,while staying near an average value. This means adynamic equilibrium between the number of foragersand the number of resters was reached when we intro-duced the adaptation mechanisms into the swarm.

Figure 5 (continues next page) The instantaneous foragers (left) and net energy (right) for swarm sizes 2, 4, 6, 8 and10 (from top to bottom, for swarm sizes 8 and 10 see next page) with different strategies, Pnew = 0.03, each plot is the av-erage of 10 runs. Left column: for each strategy both the varying number of foragers and the average number of foragersover the whole time period are plotted.

0

0.5

1

1.5

2

2.5

3R

obot

s

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Ene

rgy

ofsw

arm

(105

units

)

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0

1

2

3

4

5

6

Rob

ots

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0.5

0

0.5

1

1.5

2

2.5

3

3.5

Ene

rgy

ofsw

arm

(105

units

)

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0

1

2

3

4

5

6

7

8

Rob

ots

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0.5

0

0.5

1

1.5

2

2.5

3

3.5

Ene

rgy

ofsw

arm

(105

units

)

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 85: Silvia task

300 Adaptive Behavior 15(3)

Thus, the overall swarm task allocation (division oflabor) emerges from the low level interactions betweenrobots, and the environment. Figure 5 also plots theinstantaneous energies of the swarm with differentstrategies, and we were surprised to see the rates ofincrease of swarm energy for S2, S3, and S4 in all exper-iments were almost linear, and the swarm using all cues(S4) had the fastest rate of energy increase of all.

4.2 Variable Food Density, Fixed Swarm Size

We designed a second set of experiments to investi-gate the effect of different cues on the efficiency of theswarm under different environment conditions; herewe fixed the swarm size to 8 robots but ran the experi-ment with three different food source densities, frompoor (Pnew = 0.015) to relatively rich (Pnew = 0.045).Different strategies were used for the swarm, againeach experiment was run 10 times and each simulationlasted for 10,000 s. Table 6 provides the results byaveraging the 10 runs’ data for each experiment. Most

food produced in the environment was collected sincethere were enough robots engaged in the task. Theefficiency of the swarm is plotted in Figure 6A. As weexpect, the swarm with strategy S4 has the highestefficiency while the swarm without cues (S1) has thelowest efficiency in all experiments. Although theaverage number of foragers in the swarm with S3 wassmaller when the food source became poorer, the effi-ciency gap between S3 and S4 became larger. This isbecause the robots not carrying food collided witheach other more often in a low density food sourceenvironment. So the environmental cues had a biggerimpact on the performance of the swarm when thefood source was poor than when the food source wasrich. Comparing S3 and S2, Figure 6A shows that thedifference in efficiency between these two strategiesbecame smaller with food source density increasing.The reason is that with the food source densityincreasing, a robot is more likely to find food andtherefore reward itself (with resting less). Thus we candeduce that the social cues have less impact on the

Figure 5 (continued).

0

1

2

3

4

5

6

7

8

9

10

Rob

ots

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

1

0.5

0

0.5

1

1.5

2

2.5

3

Ene

rgy

ofsw

arm

(105

units

)

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0

1

2

3

4

5

6

7

8

9

10

11

12

Rob

ots

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

2

1.5

1

0.5

0

0.5

1

1.5

2

2.5

3

Ene

rgy

ofsw

arm

(105

units

)

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 86: Silvia task

Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots 301

performance of the swarm when the food source is richbut more impact when the food source is poor; that is,robots need to cooperate more when food is scarce.Figure 6B plots the average retrieval time for eachstrategy. It shows that despite the food source densitychanging, the average retrieval time for the swarm

with strategy S4 was quite stable in comparison withstrategies S1 and S2, which implies the swarm with theadaptation mechanism is quite robust to environmen-tal change. Instantaneous energy and foragers in theswarm are plotted in Figure 7 and, not surprisingly, anew dynamic equilibrium for the number of foragers

Table 6 Average results of 10 runs for the swarm sized 8 with different strategies. Each simulation lasts for 10,000 s.The value of Pnew is set to 0.015, 0.03 and 0.045; within each simulation this value remains constant. Efficiency is calcu-lated according to Equation 11.

Pnew StrategiesFood

Average foragers Net energy EfficiencyProduced Collected

0.015 S1 147.3 146.5 8 –304996 –103.5%

S2 158.0 155.6 4.05 –34466 –10.9%

S3 147.1 143.1 2.38 45617 15.5%

S4 149.7 142.0 2.08 64736 21.6%

0.030 S1 297.8 295.8 8 –23400 –3.9%

S2 302.1 299.1 5.65 141720 23.5%

S3 296.7 292.5 4.32 202260 34.1%

S4 296.3 290.4 3.95 227020 38.3%

0.045 S1 451.6 448.3 8 258939 28.7%

S2 447.2 442.3 6.57 343000 38.3%

S3 441.6 436.9 6.09 361789 41.0%

S4 439.8 431.5 5.65 381125 43.3%

Figure 6 (A) Efficiency of the swarm for different strategies and three different food source densities. (B) Average re-trieval time for the swarm with different strategies and three different food source densities.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 87: Silvia task

302 Adaptive Behavior 15(3)

in the swarm is observed each time the food sourcedensity is changed and the gradient of energy increaseis different for the same strategies in different foodsource environments.

4.3 Dynamic Food Density

To test if our adaptation mechanism has the ability toadapt to a dynamic environment we now disturbed the

environment by introducing a step change of probabil-ity Pnew from 0.03 to 0.015 at t = 5,000 s and thenfrom 0.015 to 0.045 a t = 10,000 s. A swarm with 8robots, using different strategies, was engaged in theforaging task. Each experiment was repeated 10 timesand each simulation lasted for 15,000 s, with otherparameters remaining as above. We plotted the instanta-neous number of foragers and net energy of the swarmwith time in Figure 8. As expected, a new dynamic

Figure 7 Instantaneous number of foragers (left) and net energy (right) for swarm size 8 and different food sourcedensities. From top to bottom, Pnew is 0.015, 0.03 and 0.045, respectively.

0

1

2

3

4

5

6

7

8

9

10

Rob

ots

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

3.5

3

2.5

2

1.5

1

0.5

0

0.5

1

Ene

rgy

ofsw

arm

(105

units

)

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0

1

2

3

4

5

6

7

8

9

10

Rob

ots

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

1

0.5

0

0.5

1

1.5

2

2.5

3

Ene

rgy

ofsw

arm

(105

units

)

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0

1

2

3

4

5

6

7

8

9

10

Rob

ots

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Ene

rgy

ofsw

arm

(105

units

)

0 2000 4000 6000 8000 1! 104

Time (s)

S 1S 2S 3S 4

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 88: Silvia task

Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots 303

equilibrium for the number of foragers in the swarmwas observed, after some delay, each time the foodsource density was changed. Figure 8 also shows thatthe swarms using social cues, S3 and S4, adapted morerapidly to the change of environment than the swarmwithout social cues, S2. The reason for this is thatsocial cues provide more information about the envi-ronment (food density) for the individuals. The gradi-ent of net energy increase for different strategiesshows the swarm using all cues (S4) could still achievea more rapid energy increase than other strategies inthe dynamic food density environment. Therefore, theswarm with the adaptation mechanism was morerobust to dynamic environmental change.

5 Conclusions and Future Studies

In this article, we have proposed a simple adaptationmechanism for a swarm foraging task which is able todynamically change the number of foragers and thusmake the swarm more energy efficient. The individu-als in the swarm use only internal cues (successful orunsuccessful food retrieval), environmental cues (col-lision with other robots while searching) and socialcues (team-mate food retrieval success or failure) todetermine whether they will rest in the nest for longerto either save energy or minimize interference, or beactively engaged in foraging, which costs more energyfor the individual but potentially gains more energy

for the swarm. With the adaptation mechanism, theswarm demonstrates:

• significantly improved performance comparedwith the swarm with no adaptation mechanism;

• emergent dynamic task allocation (division oflabor) between foraging and resting; and

• robustness to environmental change (in food den-sity).

Furthermore, the swarm with the adaptation mecha-nism seems to be able to guide the system towardsenergy optimization despite the limited sensing abili-ties of the individual robots and the simple socialinteraction rules. The swarm also exhibits the capacityto perceive the environment collectively if we takeinto account the average number of active foragers inthe swarm over time. That is, more active robots indi-cate a richer food environment and more inactiverobots indicate a poor food environment. This correla-tion can only be observed at the overall swarm leveland cannot be deduced from individual robots.

Other interesting conclusions are, firstly, that theswarms utilizing social cues achieve the highest netenergy income to the swarm and the fastest adaptationin ratio of foragers to resters when the food densitychanges; and, secondly, that the same social cues havethe greatest impact when the food density is low.

This study is ongoing, and thus far we have testedour approach in swarms with small numbers of robots

Figure 8 The instantaneous foragers (A) and net energy (B) for swarm size 8 with different strategies when the foodsource density changes during the simulation. Pnew changes from 0.03 to 0.015 when t = 5,000 and from 0.015 to 0.045when t = 10,000. A: for each strategy both the varying number of foragers and the average number of foragers withineach time segment are plotted.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 89: Silvia task

304 Adaptive Behavior 15(3)

(no more than 10) but we have not tested the scalabilityof the approach to swarms with hundreds or thousandsof robots; however, given the minimal local communica-tion between robots we have good reason to suppose theapproach is scalable. We also have confidence that theapproach will exhibit a high level of robustness to failureof individual robots, in keeping with the levels of robust-ness commonly seen in swarm robotic systems.

Currently, all values of adjustment factors in Equa-tion 4 and 5 are chosen on a trial and error basis and allexperiments described in this article used the same timeadjustment values (see Table 3). Thus we cannot be surehow closely the swarm with the adaptation mechanismapproaches optimal energy efficiency since such optimalvalues, to the best of our knowledge, cannot be deter-mined ideally through any modeling approach. Poten-tially, in order to achieve maximum energy efficiency,the swarm should also be able to automatically adapt orevolve these values. Therefore future studies willinclude: (i) introducing a learning mechanism so that theswarm can find its own time adjustment values, and (ii)analysis to determine how these values affect the per-formance of the swarm and to what extent the systemwill be able to remain robust over a range of time adjust-ment values. Future studies will also (iii) seek to developa probabilistic or statistical model for our foragingswarm, and compare the optimal performance from themathematical model and robot experiments.

Acknowledgments

The authors would like to thank Jan Dyre Bjerknes and Chris-topher Harper for their helpful discussions during preparationfor the paper. The authors also wish to thank the anonymousreferees for their insightful comments, which have guidedimprovements to the clarity and content of the paper. This workis partially supported by the EU Asia-link Selection Trainingand Assessment of Future Faculty (STAFF) project.

ReferencesBeni, G. (2005). From swarm intelligence to swarm robotics. In

E. Ôahin & W. Spears (Eds.), Swarm robotics workshop:state-of-the-art survey (pp. 1–9). New York: Springer.

Bonabeau, E., Dorigo, M., & ThJraulaz, G. (1999). Swarm intel-ligence: from natural to anrtificial systems. New York:Oxford University Press, Inc.

Camazine, S., Franks, N. R., Sneyd, J., Bonabeau, E., Deneu-bourg, J.-L., & ThJraulaz, G. (2001). Self-organization inbiological systems. Princeton, NJ: Princeton UniversityPress.

Gerkey, B. P., Vaughan, R. T., & Howard, A. (2003). Theplayer/stage project: tools for multi-robot and distributedsensor systems. In Proceedings of the International Con-ference on Advanced Robotics (pp. 317–323). Coimbra,Portugal.

Guerrero, J., & Oliver, G. (2003). Multi-robot task allocationstrategies using auction-like mechanisms. Artificial Re-search and Development in Frontiers in Artificial Intelli-gence and Applications, 100, 111–122.

Holland, O., & Melhuish, C. (1999). Stimergy, self-organiza-tion, and sorting in collective robotics. Artificial Life, 5,173–202.

Jones, C., & Matariƒ, M. J. (2003). Adaptive division of laborin large-scale multi-robot systems. In Proceedings of theIEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS) (pp. 1969–1974). Las Vegas, NV.

Krieger, M. J. B., & Billeter, J.-B. (2000). The call of duty:self-organised task allocation in a population of up totwelve mobile robots. Robotics and Autonomous Systems,30, 65–84.

Labella, T. H., Dorigo, M., & Deneubourg, J.-L. (2006). Divi-sion of labor in a group of robots inspired by ants’ forag-ing behavior. ACM Transactions on Autonomous andAdaptive Systems, 1, 4–25.

Lerman, K. (2002). Mathematical model of foraging in a groupof robots: Effect of interference. Autonomous Robots, 13,127–141.

Martinoli, A. (1999). Swarm intelligence in autonomous collec-tive robotics: from tools to the analysis and synthesis ofdistributed collective strategies. PhD thesis, Lausanne,Switzerland: Ecole Polytechnique FJdJrale de Lausanne.

Martinoli, A., & Easton, K. (2004). Modeling swarm robotic sys-tems: A case study in collaborative distributed manipulation.International Journal of Robotics Research, 23, 415–436.

Nembrini, J., Winfield, A., & Melhuish, C. (2002). Minimal-ist coherent swarming of wireless networked autono-mous mobile robots. In B. Hallam, D. Floreano, J. Hallam,G. Hayes, & J.-A. Meyer (Eds.), From Animals to Ani-mats 7: Proceedings of the Seventh International Confer-ence on Simulation of Adaptive Behavior (pp. 373–382).Cambridge, MA: MIT Press.

qstergaard, E. H., Sukhatme, G. S., & Matariƒ, M. J. (2001).Emergent bucket brigading – a simple mechanism forimproving performance in multi-robot constrained-spaceforaging tasks. In Proceedings of the 5th International Con-ference on Autonomous Agents. (pp. 29–30). New York.

Rosenfeld, A., Kaminka, G. A., & Kraus, S. (2005). A study ofscalability properties in robotic teams. In P. Scerri, R. Vin-cent, & R. Mailler (Eds.) Coordination of large-scale multi-agent systems (pp. 27–51). New York: Springer-Verlag.

Ôahin, E. (2005). Swarm robotics: From sources of inspirationto domains of application. In E. Ôabin & W. Spears (Eds.)Swarm robotics workshop: state-of-the-art survey (pp. 10–20). New York: Springer.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 90: Silvia task

Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots 305

ThJraulaz, G., Bonabeau, E., & Deneubourg, J.-L. (1998).Response threshold reinforcement and division of labourin insect societies. In Proceedings of the Royal Society ofLondon B: Biological Sciences, 265, 327–332.

Winfield, A. F., & Holland, O. E. (2000). The application of wire-less local area network technology to the control of mobilerobots. Microprocessors and Microsystems, 23, 597–607.

About the Authors

Wenguo Liu received his BSc in industry automation (2001) and MSc in pattern recogni-tion and intelligent systems (2004) from the Beijing Insititue of Technology, China. He iscurrently a PhD student working in the Bristol Robotics Laboratory, University of the Westof England, Bristol, UK. His main research interests are modeling and adaptation inswarm robotic systems.

Alan F. T. Winfield is Hewlett-Packard Professor of Electronic Engineering in the Facultyof Computing, Engineering and Mathematical Sciences at the University of the West ofEngland (UWE), Bristol, UK. He received his PhD in digital communications from the Uni-versity of Hull in 1984, then co-founded and led APD Communications Ltd until taking-upthe appointment at UWE, Bristol in 1991. He co-founded UWE’s Intelligent AutonomousSystems Laboratory (now the Bristol Robotics Laboratory) in 1993 and the focus of hiscurrent research is on the engineering and scientific applications of swarm intelligence.

Jin Sa received her BSc in computing and information system (1983) and her PhD incomputer science (1987) from the University of Manchester, UK. Currently, she is a prin-cipal lecturer within the Faculty of Computing, Engineering and Mathematical Sciences atthe University of the West of England. Her research interest has been mainly on applyingformal approaches to various application domains such as operating systems and proc-ess modeling. More recently she has been working on applying formal methods to devel-oping swarm robotic systems.

Jie Chen is a Full Professor and Assistant President, Director of the Office of Researchand Development Administration, of Beijing Institute of Technology. He received his PhDdegree at Beijing Institute of Technology. His research interests are intelligent control andintelligent system, multi-objective optimization and decision.

Lihua Dou received her PhD degree in control theory and control engineering in 2000from the Beijing Institute of Technology. She is currently a professor at Beijing Institute ofTechnology. Her research interests include pattern recognition, image processing androbotics.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 7, 2008 http://adb.sagepub.comDownloaded from

Page 91: Silvia task

Université Libre de BruxellesInstitut de Recherches Interdisciplinaires et de Développements en Intelligence Artificielle

Interference reduction through taskpartitioning in a robotic swarm

Giovanni Pini, Arne Brutschy,Mauro Birattari, and Marco Dorigo

IRIDIA – Technical Report Series

Technical Report No.

TR/IRIDIA/2009-006

April 2009

Page 92: Silvia task

IRIDIA – Technical Report SeriesISSN 1781-3794

Published by:IRIDIA, Institut de Recherches Interdisciplinaires

et de Developpements en Intelligence ArtificielleUniversite Libre de BruxellesAv F. D. Roosevelt 50, CP 194/61050 Bruxelles, Belgium

Technical report number TR/IRIDIA/2009-006

The information provided is the sole responsibility of the authors and does not necessarilyreflect the opinion of the members of IRIDIA. The authors take full responsibility for anycopyright breaches that may result from publication of this paper in the IRIDIA – TechnicalReport Series. IRIDIA is not responsible for any use that might be made of data appearingin this publication.

Page 93: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2009-006 1

INTERFERENCE REDUCTION THROUGH TASKPARTITIONING IN A ROBOTIC SWARM

or: “Don’t you step on my blue suede shoes!”

Giovanni Pini, Arne Brutschy, Mauro Birattari, and Marco DorigoIRIDIA, CoDE, Universite Libre de Bruxelles, Brussels, Belgium

{gpini,arne.brutschy,mbiro,mdorigo}@ulb.ac.be

Keywords: Swarm robotics, foraging, self-organized task allocation, task partitioning, swarm intelligence.

Abstract: This article studies the use of task partitioning as a way to reduce interference in a spatially con-strained harvesting task. Interference is one of the key problems in large cooperating groups. Wepresent a simple method to allocate individuals of a robotic swarm to a partitioned task, and showthat task partitioning can increase system performance by reducing sources of interference. Themethod is experimentally studied, both in an environment with a narrow area and an environ-ment without this constraint. The results are analyzed and compared to the case in which taskpartitioning is not employed.

1 INTRODUCTION

In collective robotics, interference is a criticalproblem limiting the growth of a group: the timeeach robot spends in non-task-relevant behaviorssuch as obstacle avoidance increases when thedensity of individuals rises—see e.g., Lerman andGalstyan (2002). The performance on tasks thatsu!er from physical interference can typically beimproved by spatial partitioning; for example, bykeeping each robot in its own “working area”. Aknown approach that uses this rationale is theso called bucket-brigade (Fontan and Mataric,1996; Shell and Mataric, 2006). In this approach,robots hand over objects to robots working inthe following area, until the objects reach theirdestination. As tasks usually cannot be parti-tioned arbitrarily, this approach e!ectively limitsthe number of robots that can be employed in thetask. A possible solution to this problem, treat-ing working areas as non-exclusive, raises otherproblems: How should individuals be allocatedto tasks? How can such an allocation help in lim-iting the amount of interference?

In this paper, we study how task partition-ing can help in reducing sources of interference.Additionally, we show a simple way to achieve

self-organized allocation to such a task partitionwhen using a robotic swarm.

We use the foraging problem, one of thecanonical testbeds for collective robotics, as abase for our studies. In our experiments, a swarmof homogeneous robots has to harvest prey ob-jects from a source area and transport them to ahome area. In this study, we limit ourselves to aharvesting task that is pre-partitioned by the de-signer into two subtasks with a sequential inter-dependency. We study a simple threshold-basedmodel of self-organized allocation and focus ontwo questions: Under which environmental con-ditions is it advantageous to partition the task?Can this partition reduce interference betweengroup members? These questions are studied intwo experiments using a simulated robot swarm.

The paper is organized as follows. We firstreview related works in Section 2. In Section 3we explain the task partitioning and the alloca-tion method employed in this study. Section 4gives the methods used in the experiments by de-scribing the environments, the simulated robots,and the controller. In Section 5 the results ofthe experiments are given and discussed. Sec-tion 6 draws some conclusions and discusses fu-ture work.

Page 94: Silvia task

2 IRIDIA – Technical Report Series: TR/IRIDIA/2009-006

2 RELATED WORK

Interference has long been acknowledged as be-ing one of the key issues in multi-robot coopera-tion (Goldberg and Mataric, 2003). Lerman andGalstyan (2002) devised a mathematical modelthat allows a quantification of the interferenceand its e!ect on group performance. Probably,the most thorough study was published by Gold-berg (2001), who identified several types of multi-robot interactions. Goldberg notes that one of themost common types of interference is physical in-terference in a central area, for example the nest.This kind of interference results from resourceconflicts, in this case physical space, and can bearbitrated by either making sure that robots stayin di!erent areas all the time or by employing ascheduling mechanism to ensure that robots usethe same space only at di!erent times.

A simple method for reducing interference byusing the first arbitration method mentioned isthe so-called bucket-brigade: robots are forcedto stay in exclusive working areas and to passobjects to the following robot as soon as theycross the boundaries of their area (Fontan andMataric, 1996; Shell and Mataric, 2006). Re-cently, this has been extended to work with adap-tive working areas by Lein and Vaughan (2008).To the best of our knowledge, current works con-cerned with bucket brigading only studied the in-fluence of interference due to obstacle avoidance.Other sources of interference (e.g., object manip-ulation) were never studied, although they mighthave a critical impact on the performance of anytask partitioning approach. To quote Shell andMataric (2006): “If the cost of picking up or drop-ping pucks is significant [. . . ], then bucket brigad-ing may not be suitable.”

Task allocation for multi-robot systems is awide field, which can be divided in intentional andself-organized task allocation. Intentional task al-location relies on negotiation and explicit com-munication to create global allocations, whereasin self-organized task allocation global allocationsresult from local, stochastic decisions. A formalanalysis and taxonomy that covers intentionaltask allocation has been proposed by Gerkey andMataric (2004). Kalra and Martinoli (2006) re-cently compared the two best-known approachesof intentional and self-organized task allocation.

The field of self-organized task allocation isin its early stages, as most studies tackle simpleproblems without task interdependencies. Stud-ies in self-organized task allocation are mostly

based on threshold-based approaches, taking in-spiration from division of labor in social in-sects. Krieger and Billeter (2000) were amongthe first to propose threshold-based approaches inmulti-robot task allocation. Labella et al. (2006)used threshold-based task allocation in a multi-foraging task. Similarly, Campo and Dorigo(2007) used a notion of the group’s internal en-ergy to allocate individuals to a multi-foragingtask. Finally, Liu et al. (2007) studied a multi-foraging task while focusing on the influence ofthe use of di!erent social cues on the overall groupperformance.

3 TASK PARTITIONING ANDALLOCATION

In this work, we study a collective foraging task.By spatially partitioning the environment, theglobal foraging task is automatically partitionedinto two subtasks: 1) harvesting prey objectsfrom a harvesting area (source) and 2) transport-ing them to a home area (nest). Robots workingon the first subtask harvest prey objects from thesource and pass them to the robots working onthe second subtask, which store the objects inthe nest. These subtasks have a sequential inter-dependency in the sense that they have to be per-formed one after the other in order to completethe global task once: delivering a prey object tothe home area.

Robots can decide to switch from one subtaskto the other, thus creating a task allocation prob-lem: individual robots have to be allocated tosubtasks and di!erent allocations yield di!erentperformance. As a prey object has to be passeddirectly from one robot to the other, a robot usu-ally has to wait some time before passing a preyobject to or receiving a prey object from a robotworking on the other subtask. This waiting timecan therefore give an indication of the allocationquality for the respective subtask: if the wait-ing time is very long, there might not be enoughrobots allocated to the other subtask. Thus, therobots can use this waiting time to decide whetherto switch subtask or not. Ideally, the waiting timeshould be the same for the two subtasks in orderfor the system to reach a stable state and deliveroptimal performance.

Our robots exploit a simple threshold-basedmodel to decide when to switch task: when thewaiting time tw is higher than a threshold ", arobot switches its subtask. The robot’s waiting

Page 95: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2009-006 3

time is a function of the average time the robotsworking in the other subtask need to completetheir task. The task-completion time of a robotdepends on two factors: 1) round-trip-time (i.e.,distance to travel) and 2) time lost due to interfer-ence. Thus, the robot’s threshold " is a functionof the round-trip-time and the interference of therobots in the other subtask. Therefore, the opti-mal task switching threshold depends on the task(i.e., time to harvest a prey object) and the en-vironment (i.e., distance between the source andthe nest). As the parameters of the environmentare not pre-programmed into the robots, deter-mining the optimal threshold can be a complexproblem. In this work, we limit ourselves to asimple method for setting this threshold: at thestart of the experiment, each robot draws a ran-dom threshold that is used as its task switchingthreshold throughout the experiment.

In the following, we study the properties ofthis simple self-organized task allocation strategy,compare this strategy to a strategy without taskpartitioning, and analyze how it can help to re-duce interference. We refer to the two strategiesas partitioned and non-partitioned, respectively.

4 METHODS

This section describes the environments in whichthe experiments are carried out, the simulatedrobots, and the robot’s controller. Additionally,we describe how we run the experiments and weintroduce some metrics that we use to evaluatethe properties of the system.

4.1 Environments

We study task allocation in two di!erent envi-ronments. In these two environments, the nest ismarked by a light source that can be perceivedby all robots, thus providing directional informa-tion. The environment is spatially partitioned intwo parts: the source is located on the left and thenest is located on the right side of the arena. Werefer to the two sides of the arena as harvest areaand store area, respectively. The exchange zoneis located between these two areas. Robots work-ing on the left side, called harvesters, gather preyobjects in the source and move them to the ex-change zone, where they pass them to the robotsworking on the other side. These are referred toas storers: their role is to transport prey objectsto the nest and store them there. The nest, the

Figure 1: Depiction of (a) the narrow-nest environ-ment used in the first experiment and (b) the wide-nest environment used in the second experiment. Thegray stripes are the source (left), and the nest (right),each 0.25m deep. The black stripe is the exchangezone, that is 0.5m deep. The light source is markedwith “L”.

source, and the exchange zone can be detectedthrough environmental cues (ground color).

At time t = 0, the robots are randomly placedin the harvest area. The experiments run fortmax = 18, 000 time steps (a simulated time ofone hour, with a time step length of 200ms). Theexperiments are run in two di!erent arenas (seeFigure 1). The first arena (Figure 1a) is 4.125 mlong with a width of 1.6 m at the source and ex-change zone, whereas the nest is 0.4 m wide. Theexchange zone is located 3.125 m away from thesource. This arena is characterized by the pres-ence of an area, critical for the task, in whichhigh interference between robots can be expected(the nest). Thus, this arena is referred to as thenarrow-nest environment.

The second arena (Figure 1b) has a rectangu-lar shape: it is 3.75 m long and 1.6 m wide. Hereas well the exchange zone is located 3.125 m fromthe source. The arena shape does not suggest thepresence of any zone where interference can behigher than in other places. This arena is referredto as the wide-nest environment.

The area of both arenas is 6 m2, 5 m2 for theharvest area and 1m2 for the store area. Theoverall area is the same in the two arenas, so thatthe same group size results in the same robot den-sity. Thus, results are comparable across the twoenvironments.

Page 96: Silvia task

4 IRIDIA – Technical Report Series: TR/IRIDIA/2009-006

4.2 Simulation

The experiments are carried out in a customsimulation environment that models geometriesand functional properties of simple objects androbots. Our robots’ model is purely kinematic.Prey objects are simulated as an attribute a robotcan posses and not as physical entities. Althoughthe experiments are conducted in simulation only,the simulated robots have a real counterpart:the swarm-bot robotic platform (Mondada et al.,2004). The platform consists of a number of mo-bile autonomous robots called s-bots, which havebeen used for several studies, mainly in swarm in-telligence and collective robotics—see for instanceGroß et al. (2006) and Nouyan et al. (2008). Thesimulated s-bots are of round shape, with a diam-eter of 0.116 m. Each of them is equipped with 16infrared proximity sensors, used to perceive ob-stacles up to a distance of 0.15 m. Eight ambientlight sensors can be used to perceive light gradi-ents up to a distance of 5.0 m. The robots areequipped with 4 ground sensors used to perceivenest, source and exchange zone. A 8 LEDs ringis used to signal when a prey object is carried.An omnidirectional camera allows the perceptionof LEDs in a circle of radius 0.6 m surroundingthe robot. A uniform noise of 10% is added toall sensor readings at each simulation step. Therobots can move at a maximum speed of 0.1 m/s

by means of a di!erential drive system.

4.3 Controller

All the robots share the same, hand coded, fi-nite state machine controller depicted in Figure 2.The controller consists of two parts, each corre-sponding to a possible subtask a robot can per-form. Gray states refer to the harvest subtask,white states to the store subtask. Since all therobots start in the harvest area, their controlleris initially set to perform anti-phototaxis. In thisway they will reach the source, where they canstart retrieving prey objects. The behavior ofeach robot is a function of the task it is per-forming. Harvesters not carrying a prey objectmove towards the source, where they can findprey. Harvesters carrying a prey object, moveto the exchange zone and wait for a free storer.Upon arrival of such a storer, the harvester passesthe prey object to it. Storers carrying a preyobject move towards the nest, where they candeposit the object. Storers not carrying a preyobject head to the exchange zone and search for

Figure 2: Simplified state diagram of the controllerof the robots. Gray states belong to the harvest task,white states to the store task. The obstacle avoidancestate has been omitted for clarity, as it is applicablein all states of the robot. tw is the time spent in theexchange zone and ! is the threshold.

a harvester with a prey object. Robots can de-tect other robots carrying a prey on the basis ofthe color of their LED ring. While moving, eachrobot avoids obstacles (walls and other robots).

Task switches can occur: a harvester carry-ing a prey object can decide to become a storer,and a storer not carrying a prey object can de-cide to become a harvester. As mentioned be-fore, robots switch task depending on an internalthreshold ", representing the maximum amountof control cycles they can spend in the transferzone waiting to pass (harvesters) or receive (stor-ers) a prey object. If a robot remains in the trans-fer zone longer than its threshold without passingor receiving prey objects (tw > "), it switches itstask. The optimal threshold value is not trivialto determine. In the work presented here, we usea simple method to set the threshold ": at thebeginning of the experiment, each robot draws arandom threshold, sampled uniformly in the in-terval [0, 1000]. We chose this method because itis independent of the environment and does notrely on complex approximation techniques. Thethreshold value does not change during the exper-iment. In case of the non-partitioned strategy, thethreshold is set to " = 0, causing the robots toswitch subtask immediately as soon as they reachthe exchange zone.

4.4 Experiments

The goal of the experiments is to investigatewhether task partitioning can reduce interfer-ence in task-critical zones, and how to allocatea robotic swarm to partitions. As pointed outby Lerman and Galstyan (2002), interference isrelated to the number of individuals in the sys-

Page 97: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2009-006 5

tem. Additionally, the physical interference be-tween robots is also a function of the environmentthe robots act in. The higher the group size, thehigher the density, resulting in a higher amountof physical interference. Thus, in order to studyinterference in our experiments, we increase thesize of the group in each of the two environmentsshown in Figure 1, while using both strategies(non-partitioned and partitioned). We study theperformance of the system when the group sizeN ranges in the interval [1, 40]. We run 50 repe-titions for each value of N and each experimentalsettin.

4.5 Metrics

In order to quantify the influence of interference,we measure the group performance P by the num-ber of prey objects collected by the swarm at theend of the experiment (tmax = 1 hour). From thegroup performance measure we can derive the in-dividual e!ciency as follows:

Ie! = P/N, (1)where N is the size of the group. Individual e#-ciency can help to understand the e!ect of inter-ference on the performance.

In order to measure the influence of environ-mental features on the interference, we definean interference measure taking inspiration fromRosenfeld et al. (2005). In their work, interfer-ence is measured as the time spent performingactions not strictly related to the task, but ratherlost due to negative interactions with the en-vironment (e.g., obstacle avoidance maneuvers).By registering the number of collisions for eacharea of the arena, we can draw conclusions aboutwhere physical interferences happen most often.We measure interference through the state of thecontroller: in our case a robot is experiencing in-terference each time its controller perceives an ob-stacle.

In case of a partitioned task, there is anothersource of ine#ciency that adds to interference:the time lost in the exchange zone. We definethe strategy cost C as the sum of time lost be-cause of physical interference and time lost in theexchange zone:

C = Tint + Tpart , (2)where Tint is the amount of time steps duringwhich the controller perceives an obstacle, andTpart is the total amount of time steps spent inprey passing maneuvers. By using this metric,

the cost of the non-partitioned strategy is purelydue to interference (Tpart = 0), while in case ofthe partitioned strategy, prey passing costs addto interference costs. In a way, passing a preyobject produces another kind of interference inthe system. The strategy cost captures this e!ect,thus allowing for a comparison of strategies.

5 RESULTS AND DISCUSSION

The graphs in Figures 3a and 4 show theperformance P for di!erent group sizes in thenarrow-nest and wide-nest environment respec-tively. Figure 3b shows the individual e#ciencyIe! of the robots in the narrow-nest environment.Black curves are the average computed over the50 repetitions of each setting, gray curves indi-cate the 95% confidence interval on the expectedvalue. The performance graph in Figure 3a showsthat the partitioned strategy improves perfor-mance in the narrow-nest environment. Thegraph shows that the non-partitioned strategyperforms better than the partitioned strategy forsmall group sizes (up to N = 13 robots). How-ever, increasing the group size makes the non-partitioned strategy collapse: the number of gath-ered prey objects drops dramatically for groupslarger than 13. The individual e#ciency graph(Figure 3b) can explain the behavior of the sys-tem. The robots employing the partitioned strat-egy are less e#cient, for small group sizes, thanthose performing the non-partitioned strategy.However, the addition of more individuals af-fects the e#ciency of the non-partitioned groupin a more dramatic way. At a certain point,the drop in e#ciency becomes very steep for thenon-partitioned strategy. On the other hand, thepartitioned strategy scales better: individual e#-ciency drops smoothly. This explains why a groupusing the partitioned strategy performs better: itcan benefit from the work of more individuals andtherefore collects more prey objects. These con-siderations do not hold in the wide-nest environ-ment. The performance graph in Figure 4 showsthat the non-partitioned strategy performs bet-ter than the partitioned strategy for group sizesN < 33. In both the environments, indepen-dently of the strategy used to accomplish the task,the system collapses when the area is saturatedby the swarm.

Figure 5 shows the e!ect on the cost of in-creasing the number of robots in the narrow-nestenvironment. The graph compares the cost C of

Page 98: Silvia task

6 IRIDIA – Technical Report Series: TR/IRIDIA/2009-006

0 10 20 30 40

010

020

030

040

0

Prey

obj

ects

retri

eved

(P)

● non−partitionedpartitioned

● ●

● ●

● ●● ● ●

● ● ●● ●

● ●●

●●

● ●●

● ●●

010

2030

4050

Number of robots (N)

Prey

obj

ects

retri

eved

by in

divid

ual r

obot

(Ieff)

● ●●

● ● ● ● ● ● ● ●●

●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Figure 3: (a) Performance P and (b) individual e"ciency Ie! for increasing number of robots in the narrow-nestenvironment. The black continuous line refers to the case of no task partitioning, the black dashed line to thecase of partitioning. Gray lines indicate the 95% confidence interval on the expected value.

0 10 20 30 40

020

040

060

080

0

Number of robots (N)

Prey

obj

ects

retri

eved

(P)

● non−partitionedpartitioned

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ● ●●

●●

●●

● ● ●

Figure 4: Performance P for increasing number of robots in the wide-nest environment. The black continuousline refers to the case of no task partitioning, the black dashed line to the case of partitioning. Gray lines indicatethe 95% confidence interval on the expected value.

each of the two strategies for di!erent group sizes.In case of the partitioned strategy (Figure 5a),the graph shows each component of the cost (Tint

and Tpart). Clearly, task partitioning has the ef-fect of reducing the cost due to interference buthas the disadvantage of increasing the cost due totime lost. The probability of two or more robotsencountering each other increases with the robotdensity. Although this determines a higher inter-ference cost (i.e., Tint), it decreases the cost dueto lower waiting time (i.e., Tpart) in the case ofthe partitioned strategy. Partitioning performsbetter when the gain from interference reductionis greater than the loss of performance due to par-

titioning ine#ciencies. These considerations holdin the narrow-nest environment, where the like-lihood of physical interference in a task-criticalzone is very high. In the wide-nest environment,interference in the nest is as likely as interferencein the exchange zone. Thus, it is not beneficial topay the cost of waiting and the non-partitionedstrategy performs better for any group size.

The mechanism by which partitioning reducesinterference costs can be deduced by compar-ing the interference graphs in Figure 6. Thegraphs show the number of times that physicalinterference (as defined in Section 4.5) was regis-tered in each region of the narrow-nest environ-

Page 99: Silvia task

IRIDIA – Technical Report Series: TR/IRIDIA/2009-006 7

x

−2

−1

0

1

y

−0.5

0.0

0.5

Interference

020

40

60

80

100

partitioned strategy

x

−2

−1

0

1

y

−0.5

0.0

0.5

Interference

020

40

60

80

100

non−partitioned strategy

Figure 6: Mean interference values registered for (a) the partitioned strategy and (b) the non-partitioned strategy,both in the narrow-nest environment. Shown values are observation means of 50 repetitions with N = 18robots. Coordinates on the x- and y-axis are given in meters. The arena is stretched along the y-axis for bettervisualization. The dashed white line marks the location of the exchange zone.

Stra

tegy

cos

t (C)

050

0010

000

1500

020

000

Collision costsPartition costs

Stra

tegy

cos

t (C)

050

0010

000

1500

020

000

0 5 10 15 20 25 30 35 40

Number of robots (N)

partitioned strategy

non−partitioned strategy

Figure 5: Cost of interference in the narrow-nest envi-ronment. Bars represent the cost C, sum of interfer-ence time Tint and partition time Tpart (i.e., waitingtimes). For easy reference, the outline of the bars ofthe respective other graph has been added to eachgraph. (a) Costs for the partitioned strategy, whereinterference cost stem from waiting times and colli-sions. (b) Cost in case of the non-partitioned strategy,where only physical interference through collisions ex-ists.

ment. The total area was discretized in squares of1 cm2. Figure 6 shows the results obtained with18 robots, in the case of the non-partitioned strat-egy (Figure 6a) and in the case of the partitionedstrategy (Figure 6b). The graphs show thatthe use of the non-partitioned strategy leads tohigh interference in the nest, which becomes con-gested. Partitioning the task reduces the robotdensity in the nest, thus spreading the interfer-ence more uniformly across the arena. In addi-tion, the overall interference diminishes becausethe exchange zone is wider: the robots have morefreedom of movement and collide less often. Al-though the graphs show only data collected with18 robots, experiments with di!erent group sizesproduced similar results.

6 CONCLUSIONS ANDFUTURE WORK

Interference can be an issue when working withswarms of robots. In this work, we used task par-titioning and allocation to reduce interference be-tween robots sharing the same physical space. Wemanually partitioned the environment and em-ployed a simple self-organized strategy for allo-cating individuals to subtasks. Results show thata partitioning strategy improves performance in aconstrained environment. Additionally, we iden-tified cases in which partitioning is not advanta-geous and a non-partitioned strategy should beused. The proposed strategy is fairly simple andfar from being an optimal solution, nevertheless

Page 100: Silvia task

8 IRIDIA – Technical Report Series: TR/IRIDIA/2009-006

we improved the performance of the swarm wheninterference was costly.

Future work will concern the identification ofthe optimal allocation in the studied environ-ments as well as the development and study ofa strategy that can find this optimal allocation ina self-organized and adaptive way. In addition,the interference metric proposed in Section 4.5could be used by the robots to decide whether topartition the task. In this way, we could achieveeven better performance, since partitioning wouldbe employed only when strictly needed. Finally,the goal is to validate the system using the realrobots.

ACKNOWLEDGEMENTS

This work was supported by the SWARMANOIDproject, funded by the Future and EmergingTechnologies programme (IST-FET) of the Eu-ropean Commission, under grant IST-022888and by the VIRTUAL SWARMANOID projectfunded by the Fund for Scientific ResearchF.R.S.–FNRS of Belgium’s French Community.The information provided is the sole responsibil-ity of the authors and does not reflect the Euro-pean Commission’s opinion. The European Com-mission is not responsible for any use that mightbe made of data appearing in this publication.Marco Dorigo and Mauro Birattari acknowledgesupport from the Fund for Scientific ResearchF.R.S.–FNRS of Belgium’s French Community, ofwhich they are a research director and a researchassociate, respectively.

REFERENCES

Campo, A. and Dorigo, M. (2007). E"cient multi-foraging in swarm robotics. In Advances in Artifi-cial Life: Proceedings of the VIIIth European Con-ference on Artificial Life, Lecture Notes in Artifi-cial Intelligence LNAI 4648, pages 696–705, Berlin,Germany. Springer Verlag.

Fontan, M. S. and Mataric, M. J. (1996). A study ofterritoriality: The role of critical mass in adaptivetask division. In From Animals to Animats 4: Pro-ceedings of the Fourth International Conference ofSimulation of Adaptive Behavior, pages 553–561,Cambridge, MA. MIT Press.

Gerkey, B. P. and Mataric, M. J. (2004). A for-mal analysis and taxonomy of task allocation inmulti-robot systems. The International Journal ofRobotics Research, 23(9):939–954.

Goldberg, D. (2001). Evaluating the Dynamics ofAgent-Environment Interaction. PhD thesis, Uni-versity of Southern California, Los Angeles, CA.

Goldberg, D. and Mataric, M. J. (2003). Maximizingreward in a non-stationary mobile robot environ-ment. Autonomous Agents and Multi-Agent Sys-tems, 6(3):287–316.

Groß, R., Bonani, M., Mondada, F., and Dorigo, M.(2006). Autonomous self-assembly in swarm-bots.IEEE Transactions on Robotics, 22(6):1115–1130.

Kalra, N. and Martinoli, A. (2006). A compara-tive study of market-based and threshold-basedtask allocation. In Proceedings of the 8th Inter-national Symposium on Distributed AutonomousRobotic Systems (DARS), Minneapolis, Minnesota,USA.

Krieger, M. J. B. and Billeter, J.-B. (2000). The callof duty: Self-organised task allocation in a popu-lation of up to twelve mobile robots. Journal ofRobotics and Autonomous Systems, 30:65–84.

Labella, T. H., Dorigo, M., and Deneubourg, J.-L.(2006). Division of labor in a group of robots in-spired by ants’ foraging behavior. ACM Trans-actions on Autonomous and Adaptive Systems,1(1):4–25.

Lein, A. and Vaughan, R. (2008). Adaptive multi-robot bucket brigade foraging. In Proceedings ofthe Eleventh International Conference on ArtificialLife (ALife XI), pages 337–342, Cambridge, MA.MIT Press.

Lerman, K. and Galstyan, A. (2002). Mathematicalmodel of foraging in a group of robots: E#ect ofinterference. Auton. Robots, 13(2):127–141.

Liu, W., Winfield, A., Sa, J., Chen, J., and Dou,L. (2007). Towards energy optimization: Emer-gent task allocation in a swarm of foraging robots.Adaptive Behavior, 15(3):289–305.

Mondada, F., Pettinaro, G. C., Guignard, A., Kwee,I. V., Floreano, D., Deneubourg, J.-L., Nolfi,S., Gambardella, L. M., and Dorigo, M. (2004).SWARM-BOT: A new distributed robotic concept.Autonomous Robots, 17(2–3):193–221.

Nouyan, S., Campo, A., and Dorigo, M. (2008). Pathformation in a robot swarm. Self-organized strate-gies to find your way home. Swarm Intelligence,2(1):1–23.

Rosenfeld, A., Kaminka, G. A., and Kraus, S. (2005).A study of scalability properties in robotic teams.In Coordination of Large-Scale Multiagent Systems,pages 27–51, New York. Springer Verlag.

Shell, D. and Mataric, M. J. (2006). On foragingstrategies for large-scale multi-robot systems. InIntelligent Robots and Systems, 2006 IEEE/RSJInternational Conference on, pages 2717–2723,Beijing, China.

Page 101: Silvia task

Self-organized Task Partitioningin a Swarm of Robots

Marco Frison1,2, Nam-Luc Tran1, Nadir Baiboun1,3, Arne Brutschy1,Giovanni Pini1, Andrea Roli1,2, Marco Dorigo1, and Mauro Birattari1

1 IRIDIA, CoDE, Universite Libre de Bruxelles, Brussels, Belgium2 Universita di Bologna, Bologna, Italy

3 ECAM, Institut Superieur Industriel, Brussels, [email protected], nadir [email protected],

{namltran,arne.brutschy,gpini,mdorigo,mbiro}@ulb.ac.be,[email protected]

Abstract. In this work, we propose a method for self-organized adap-tive task partitioning in a swarm of robots. Task partitioning refers tothe decomposition of a task into less complex subtasks, which can thenbe tackled separately. Task partitioning can be observed in many speciesof social animals, where it provides several benefits for the group. Self-organized task partitioning in artificial swarm systems is currently notwidely studied, although it has clear advantages in large groups. We pro-pose a fully decentralized adaptive method that allows a swarm of robotsto autonomously decide whether to partition a task into two sequentialsubtasks or not. The method is tested on a simulated foraging problem.We study the method’s performance in two di!erent environments. Inone environment the performance of the system is optimal when the for-aging task is partitioned, in the other case when it is not. We show thatby employing the method proposed in this paper, a swarm of autonomousrobots can reach optimal performance in both environments.

1 Introduction

Many animal species are able to partition complex tasks into simpler subtasks.The act of dividing a task into simpler subtasks that can be tackled by di!erentworkers is usually referred to as task partitioning [15].

Although task partitioning may have associated costs, for example because ofwork transfer between subtasks, there are many situations in which partitioningis advantageous. Benefits of task partitioning include, for example, a reduction ofinterference between individuals, an improved exploitation of the heterogeneityof the individuals, and an improved transport e"ciency [9].

Humans widely exploit the advantages of task partitioning in everyday activ-ities. Through centuries, humans have developed complex social rules to achievecooperation. These include planning, roles and work-flows. Ancient romans real-ized the importance of partitioning and they codified it in their military principledivide et impera (also known as divide and conquer), which became an axiom inmany political [17] and sociological theories [10].

M. Dorigo et al. (Eds.): ANTS 2010, LNCS 6234, pp. 287–298, 2010.c! Springer-Verlag Berlin Heidelberg 2010

Page 102: Silvia task

288 M. Frison et al.

Examples of task partitioning can also be observed in social insects. A widelystudied case is the foraging task in ants and bees. In foraging, a group of in-dividuals has to collect and transport material to their nest. Foraging involvesmany di!erent phases and partitioning can occur simultaneously in many ofthem. Typical phases where task partitioning can occur are the exploration ofthe environment and the preparation of raw materials [15]. Examples of taskpartitioning are the harvesting of leaves by the leaf-cutter ants [9], the exca-vation of nest chambers in Pogomomyrmex, and the fungus garden removal inAtta [18].

Also in swarm robotics there are situations in which it is convenient to par-tition a task into subtasks. Advantages include increased performance at grouplevel, stimulated specialization, and parallel task execution. In most of the cases,task partitioning is done a priori and the focus is on the problem of allocatingindividuals to subtasks in a way that maximizes e"ciency. However, in manycases, task partitioning cannot be done a priori because the relevant informationon the environment is not available. We consider self-organized task partitioningas a suitable approach in these cases.

In this work, we focus on the case in which a task is partitioned into subtasksthat are sequentially interdependent. We propose a simple method, based onindividuals’ perception and decisions, that allows a swarm of autonomous robotsto decide whether to partition a foraging task into subtasks. We test the methodwith a swarm of simulated robots in two di!erent environmental conditions.

The rest of the paper is organized as follows. In Section 2 we describe the prob-lem and we review related works. In Section 3 we propose an adaptive methodthat we tested with simulated robots. In Section 4 we provide a description ofthe experimental framework we consider. In Section 5 we report and discuss theresults. Finally, in Section 6 we summarize the contribution of this work andpresent some directions for future research.

2 Problem Description and Related Works

We study a swarm of robots that has to autonomously decide whether to parti-tion a task into subtasks. Our focus is on situations in which a task is partitionedinto sequential subtasks: the subtasks have to be executed in a certain sequence,in order to complete the global task once [5].

In these cases, we can identify tasks interfaces where one task ends and an-other begins. Through tasks interfaces, individuals working on one of the sub-tasks can interact, either directly or indirectly, with individuals working on othersubtasks.

An example of sequential task partitioning, observable in nature, is the forag-ing activity in Atta leaf cutting ants. The sequential interdependency betweentasks stems from the fact that each leaf has to be cut from a tree before it canbe transported to the nest. Each individual can choose whether to perform boththe cutting and transporting subtasks, or to specialize in one subtask only.

Hart at al. [9] described the strategy employed by Atta ants: some individualswork on the tree, cutting and dropping leaves to the ground, while the rest of

Page 103: Silvia task

Self-organized Task Partitioning in a Swarm of Robots 289

the swarm gathers and transports these leaves to the nest. Here the advantage ofpartitioning comes from the fact that the energy cost to climb the tree has to bepaid only once by those individuals working as leaf cutters. Disadvantages comefrom the fact that energy has to be spent to search for leaves on the ground.The task interface can be identified, in this case, as the area where the leavesland. Such areas are usually referred to as caches, and facilitate indirect transferof material between individuals. Anderson and Ratnieks described how foragersof di!erent ant species partition the leaf transport task by using caches [4].

Partitioning along foraging trails using direct transfer of material betweenindividuals can be observed in other ant species and in other social insects. Inthe case of direct transfer, the benefit of partitioning can come from the fact thatmaterial weight can be matched with the strength of the transporter [2]. Akreet al. observed task partitioning within Vespula nectar foraging [1]. Andersonand Ratnieks studied partitioned foraging in honeybees species, showing thatthe larger the swarm, the higher the performance [3].

Robotic swarms often face situations similar to those of their natural coun-terparts. However, despite its importance, few works have been devoted to taskpartitioning in swarm robotics. Notable exceptions are the works of Fontan andMataric as well as Shell and Mataric on bucket-brigading [8,16]. In these works,robots have to gather objects in an environment. Each robot operates in a lim-ited area and drops the object it carries when it reaches the boundaries of itsworking area. This process leads to objects being passed across working areasuntil they reach the nest. Lein and Vaughan proposed an extension to this work,in which the size of the robots’ working areas is adapted dynamically [11]. Piniet al. showed that the loss of performance due to interference, can be reduced bypartitioning the global task into subtasks [14]. To the best of our knowledge, self-organized task partitioning in terms of adaptive task decomposition has neverbeen investigated.

3 The Method

The method we propose allows a swarm of robots to adaptively decide whetherto partition a task into sequential subtasks or not. A decision is made by eachindividual: In case a robot decides to employ task partitioning, it works onlyon one of the subtasks. In case a robot decides not to employ task partitioning,it performs the whole sequence of subtasks. The method is fully distributedand does not require explicit communication between individuals. The swarmorganizes in a way that maximizes the overall e"ciency of the group, regardlessof the specific environment. E"ciency is defined as the throughput of the system.

In the method proposed, each individual infers whether task partitioning isadvantageous or not on the basis of its waiting time at tasks interfaces. We definethe probability p that a robot has to employ task partitioning as:

p = 1 ! 11 + e"!(w(k))

, (1)

Page 104: Silvia task

290 M. Frison et al.

with ! being:

!!w(k)

"=

w(k)s

! d , (2)

and w(k) being the weighted average waiting time at task interfaces after kvisits, which is calculated as follows:

w(k) = (1 ! ")w(k ! 1) + "wM . (3)

In Equation 2, s and d are a scale and a delay factor, respectively. In Equation 3," " (0, 1], is a weight factor that influences the responsiveness to changes: highervalues lead to a readily responsive behavior. The value of these parameters canbe determined empirically. The variable wM is the measured waiting time attask interface and ranges in [0, wMAX ). The upper limit wMAX ensures thatrobots eventually renounce to employ task partitioning when their waiting timebecomes too high. Each time a robot completes a subtask, it decides whether toemploy task partitioning for the next task execution, or not.

4 Experimental Setup

The purpose of the experiments described in this section is to show the validityof the method described in Section 3. To illustrate the approach we have chosena foraging problem as a testbed. It is a canonical choice in collective robotics asit is easy to model and its applications are numerous [7].

In the experiments, the global task can be described as harvesting objects froma single source and storing them in the nest. The global task can be partitionedinto two subtasks, referred to as harvesting and storing, respectively. Partitioningenables the subdivision of the environment into two areas, linked by a taskinterface as defined in Section 2. The task interface is represented by a cachethat can be used by the robots to exchange objects. As the cache has a limitedcapacity, robots that decide to use it may have to wait. The waiting time isdefined as the delay between the moment when a robot decides to use the cache,either for picking up or dropping objects, and the moment when this e!ectivelybecomes possible. It is also possible to avoid the cache by using a separatecorridor, which links directly the source and the nest.

Each robot has to autonomously decide whether to partition the foragingtask, by using the cache; or not to partition it, by using the corridor. The swarmcan also employ a mixed strategy in which some individuals use the cache andothers use the corridor. Robots have no notion of the global performance of theswarm. In no case explicit communication is used. Figure 1 shows a simplifiedstate diagram that represents the behavior of each individual.

4.1 Simulation Tools

All the results presented in the paper are obtained using the ARGoS simulationframework [13]. ARGoS is a discrete-time, physics-based simulation environmentthat allows one to simulate experiments at di!erent levels of detail. For the

Page 105: Silvia task

Self-organized Task Partitioning in a Swarm of Robots 291

Drop incache

Harvestfrom source

Pick upfrom cache

Store innest

1 ! pd

1 ! pp

pd

pp

Fig. 1. Simplified state machine representing the behavior of each individual. Prob-abilities pd and pp are both defined using Equation 1 as described in Section 3. Thevariable pd represents the probability of using the cache to drop an object. The variablepp represents the probability of picking up an object from the cache. The states Avoidobstacles and Navigate have been omitted for clarity.

experiments presented in this paper, it is su"cient to simulate kinematics in abi-dimensional space. A common control interface provides transparent accessto real and simulated hardware, allowing the same controller to run also on thereal robots without modifications.

The robots we use in this research are the e-pucks1. The e-puck has beendesigned with the purpose of being both a research and an educational tool foruniversities [12]. ARGoS simulates the whole set of sensors and actuators of thee-puck. In our experiments we use the wheel actuators, the 8 IR sensors for lightand proximity detection, the VGA camera, and the ground sensors.

4.2 Harvesting Abstraction

As the e-pucks are not capable of grasping objects, we developed a device tosimulate this process [6]. Figure 2 shows a schematic representation of the device,called an array2. It consists of a variable number of slots, named booths. Eachbooth is equipped with two RGB LEDs that can be detected by the robotsthrough their color camera. A light barrier can detect the presence of a robotwithin each booth. Reactions to this event, such as changes in LEDs color, areprogrammable.

In the experiments presented in the paper, a green booth, either in the sourceor in the cache, means that an object is available there. Analogously, a bluebooth means that an object can be dropped there. By using this abstraction,when a robot enters a booth in which the LEDs are lit up in green, we considerthat the robot picks up an object from that booth. When a robot enters a boothin which the LEDs are lit up in blue, we consider that the robot drops an object

1 http://www.e-puck.org/2 The array is under development and a working prototype is currently available.

Page 106: Silvia task

292 M. Frison et al.

Fig. 2. Schematics of an array of booths with four booths on each side. The light bar-riers, represented by black semicircles, detect when a robot enters in the correspondingbooth. LEDs, used to signal available pick up or drop sites, are represented by blanksemicircles.

in that booth. In both cases, when the booth perceives the presence of the robot,it reacts by turning the LEDs red, until the robot has left. Once the robot hasleft, the booths behave in a di!erent way, depending whether they are source,nest or cache booths. In the case of the source, the booth turns green, to signalthe availability of a new object to harvest. In the case of the nest, the booth turnsblue, to signal that the corresponding store spot is available again. In case of thecache, if the robot leaves after picking up an object, the booth, previously green,turns o! and the corresponding booth on the other side turns blue signalingthat the spot is now available again for dropping an object. Conversely, if therobot leaves after dropping an object, the booth, previously blue, turns o! andthe corresponding booth on the other side turns green signaling that an objectis available for being picked up. This simple logic allows us to simulate objecttransfer through the cache, as well as harvest from the source and store in thenest.

4.3 Environments

We run the experiments in two di!erent environments, named short-corridorenvironment and long-corridor environment (see Figures 3a and 3b).

In both the environments, the nest array is located on the right-hand side,while the source array is located on the left-hand side of a rectangular arena.Both the nest and the source arrays have four booths, all on one side. Thecache array is located between the nest and the source and has three booths oneach side. Therefore, the cache has a limited capacity, which is determined bythe number of booths on each side. Di!erent ground colors allow the robots torecognize on which side of the cache they are.

Although the cache array cannot be crossed by robots, a corridor links thetwo areas and allows the robots to harvest/store objects without using the cache

Page 107: Silvia task

Self-organized Task Partitioning in a Swarm of Robots 293

Fig. 3. Representation of the a) short-corridor and the b) long-corridor environmentsused in the experiments. Nest and source arrays have both four booths on one side.The cache array has three booths for each side. Di!erent ground colors help the robotsto distinguish between di!erent parts of the environment and to navigate through thecorridor that connects the two areas. The light source, used as landmark for navigatingin the corridor, is marked with “L”.

array (i.e., without partitioning the foraging task). A light source and two coloredtrails help the robots to navigate through the corridor. Both the environmentsare 1.6 m wide, but they di!er in the corridor’s length: in the short-corridorenvironment the corridor is 1.5 m long, while in the long-corridor environment itis 3.5 m long. Both the use of the cache and the use of the corridor entail costs.The cache can be seen as a shortcut between source and nest: robots cannot crossthe cache, but its use can make material transfer faster. However, the cache canalso become a bottleneck as the decision of using it can lead to delays if it isbusy when dropping objects, or empty when picking them up. The decision ofusing the corridor imposes a cost due to the time spent traveling through it.Thus, the transfer cost varies with cache and group size while the travel costvaries with corridor length. As we keep the size of the cache array constantin our experiments, the corridor length determines the relative cost betweenpartitioning and not partitioning. In the long-corridor environment, the use ofthe cache is expected to be preferable. On the other hand, in the short-corridorenvironment, the cost imposed by the corridor length is low and should lead tothe decision not to use the cache.

Page 108: Silvia task

294 M. Frison et al.

4.4 Experimental Settings

We run two di!erent sets of experiments: in a first set of experiments we areinterested in assessing the performance of the adaptive method that we propose,while in the second set of experiments we aim at evaluating its scalability. In thefirst set of experiments the robotic swarm is composed of twelve e-pucks, eachcontrolled by the same finite state machine depicted in Figure 1. In both theenvironments, we compare the adaptive method described in Section 3 to twostrategies which always partition (pd = pp = 1) or never partition (pd = pp = 0).These are used as reference strategies for evaluating both the performance ofthe proposed method and its capability of adapting to di!erent environmentalconditions. The values of the parameters have been determined empirically andfixed to s = 20, d = 5, " = 0.75, wMAX = 15 s. The duration of each experi-mental run is 150 simulated minutes. For each experimental condition we run 30repetitions, varying the seed of the random numbers generator. At the beginningof each experiment the robots are randomly positioned in the right side of theenvironment, where the nest array is located. As the average waiting time w(k)is initially equal to zero, all the robots start with probabilities pd = pp # 1: fromEquations 1 and 2, with d = 5; when !

!w(k)

"= 0, p = 0.993.

In the second set of experiments we compare the adaptive method to thereference strategies in the short-corridor environment. In this case the size ofthe swarm varies in the interval [4, 60]. For each condition we run 10 randomlyseeded experiments. The remaining parameters of the experiment are the sameas described for the first set of experiments.

5 Results and Discussion

As we keep the capacity of the cache array constant, it is the length of the cor-ridor that determines which behavior maximizes the throughput of the system.Partitioning allows the robots to avoid the corridor but, in order to exploit ef-ficiently the cache, the swarm has to organize and to work on both of its sides.Additionally, as pointed out in Section 4, the robots might have to wait in orderto use the cache.

In the short-corridor environment, the cost of using the cache is higher thanthe cost of using the corridor. In this case the robots should decide for a non-partitioning strategy, without exchanging objects at the cache. Conversely, in thelong-corridor environment the time required to travel along the corridor is high,and partitioning the task by using the cache is expected to be more e"cient.

The graphs in Figure 4 show the average throughput of a swarm of twelverobots in the two environments. Throughput is measured as the number of ob-jects retrieved per minute. The adaptive method is compared with the two ref-erence strategies in which the robots never/always use the cache. As expected,each of these reference strategies performs well only in one environment: thestrategy that never uses the cache performs better in the short-corridor envi-ronment, while the strategy that always use the cache performs better in the

Page 109: Silvia task

Self-organized Task Partitioning in a Swarm of Robots 295

0 50 100 150

0.8

1.0

1.2

1.4

1.6

1.8

Thr

ough

put (

obje

cts/

min

utes

)

Long!corridor environment

!

!

!

!!

!!

! ! ! ! ! ! ! ! ! !

!

!

!!

!

Time (minutes)

!

Strategies

never partition (pd = pp = 0)always partition (pd = pp = 1)adaptive (s = 20, d = 5, ! = 0.75)

0 50 100 150

0.8

1.0

1.2

1.4

1.6

1.8

Thr

ough

put (

obje

cts/

min

utes

)

Short!corridor environment

0 50 100 150

!

!

!

!

! !! ! ! ! ! ! ! ! ! ! ! !

!

!

!! !

Time (minutes)

!

Strategies

never partition (pd = pp = 0)always partition (pd = pp = 1)adaptive (s = 20, d = 5, ! = 0.75)

Fig. 4. Average throughput in the short-corridor (top) and long-corridor (bottom) en-vironment for di!erent strategies. Time is given in simulated minutes. Throughput ismeasured as objects retrieved per minute. Parameters for the adaptive behavior areset to s = 20, d = 5, ! = 0.75, wMAX = 15 s. Parameter values have been obtained em-pirically. Each experiment has been repeated 30 times, varying the seed of the randomnumber generator. The bars around the single data points represent the confidenceinterval of 95% on the observed mean.

long-corridor environment. On the other hand, the adaptive method we proposeshows good performance in both environments.

Concerning the long-corridor environment, we assumed that the best strategywas to always use the cache and to avoid the corridor. However, Figure 4(bottom)shows that the adaptive method proposed improves over this fixed strategy. Tobetter understand this behavior, we empirically determined the fixed-probabilitystrategy yielding the highest throughput for each environment. This analysisrevealed that the best strategy in the long-corridor environment is to use thecache with a probability around 80%. In the short corridor environment a non-partitioning strategy is preferable.

Table 1 reports the average throughput obtained at the end of the run foreach strategy in each environment. The results reported in the table show that

Page 110: Silvia task

296 M. Frison et al.

Table 1. Average throughput at the end of the experiment for a swarm of 12 robots.Results are reported for di!erent partitioning strategies in the short-corridor and long-corridor environments. The values in parenthesis indicate the 95% confidence intervalon the value of the mean.

Never partition Always partition Fixed (pd, pp = 0.8) Adaptive

Short-corridor 1.81 (±0.012) 1.57 (±0.015) 1.67 (±0.016) 1.79 (±0.017)Long-corridor 1.22 (±0.006) 1.36 (±0.015) 1.40 (±0.013) 1.43 (±0.017)

0 10 20 30 40 50 60

1.0

1.5

2.0

2.5

Swarm size

Fin

al th

roug

hput

(ob

ject

s/m

inut

es)

!

!

!

!

!

!

!

!!

!!

! ! ! ! !!

!

!

!

!

!

!

!

!

!

!

!!

!

Strategies

never partition (pd = pp = 0)always partition (pd = pp = 1)adaptive (s = 20, d = 5, ! = 0.75)

Fig. 5. Impact of the swarm size in the short-corridor environment. The value of thethroughput, measured as number of objects retrieved per minute, is the average valuereached by the swarm at the end of the experiment. Each experimental run lasts 150simulated minutes. For each experimental condition we run 10 repetitions, varying theseed of the random number generator.

fixed-probability strategies perform well only in one of the two environments.Our adaptive method, on the other hand, reaches good performance in both theshort-corridor and the long-corridor environment. These results confirm thatthe method we propose allows a swarm of robots to take a decision concerningwhether to partition a task into sequential subtasks or not.

The graph in Figure 5 shows the results of the second set of experiments, inwhich we focus on scalability. In this case we compare di!erent strategies fordi!erent swarm sizes in the short-corridor environment. As discussed previously,in the short-corridor environment the optimal strategy is to always use the cor-ridor. It can be observed that for small swarm sizes this strategy performs well.However, the performance of this strategy drops drastically when the number ofrobots increases. The reason for this degradation is the increasing interferencebetween the robots, which increases the cost of using the corridor. The parti-tioned strategy does not su!er from steep drops of performance. However, thethroughput is considerably lower for smaller group sizes, as the waiting time atthe cache becomes dominant. The adaptive method performs well across all thestudied swarm sizes, finding a good balance between the robots that use thecache and those that use the corridor.

Page 111: Silvia task

Self-organized Task Partitioning in a Swarm of Robots 297

6 Conclusions

In this research we investigated the self-organized task partitioning problemin a swarm of robots. In particular we have proposed an adaptive method totackle the task partitioning problem with a simple strategy based on individual’sperception of each subtask performance. In the method proposed, the subtaskperformance is estimated by each robot by measuring its waiting time at taskinterfaces. Results show that the adaptive method we propose reaches the bestperformance in the two environments we considered, employing task partitioningonly in those cases in which the benefits of partitioning overcome its costs. Thestudy of the impact of the group size reveals that the method scales well with theswarm size. Future work will concern the study of self-organized task partitioningin multi-foraging problems in environments with several caches. In addition, weare interested in studying the application of the method proposed to cases inwhich partitioning happens through direct material transfer.

Acknowledgements. This work was partially supported by the virtualSwarmanoid project funded by the Fund for Scientific Research F.R.S.–FNRSof Belgium’s French Community. Marco Dorigo and Mauro Birattari acknowl-edge support from the F.R.S.–FNRS, of which they are a research directorand a research associate, respectively. Marco Frison acknowledges support from“Seconda Facolt di Ingegneria”, Alma Mater Studiorum, Universita di Bologna.Andrea Roli acknowledges support from the “Brains (Back) to Brussels” 2009programme, founded by IRSIB – Institut d’encouragement de la Recherche Sci-entifique et de l’Innovation de Bruxelles.

References

1. Akre, R.D., Garnett, W.B., MacDonald, J.F., Greene, A., Landolt, P.: Behavior andcolony development of Vespula pensylvanica and Vespula atropilosa (hymenoptera:Vespidae). Journal of the Kansas Entomological Society 49, 63–84 (1976)

2. Anderson, C., Jadin, J.L.V.: The adaptive benefit of leaf transfer in Atta colombica.Insectes Sociaux 48, 404–405 (2001)

3. Anderson, C., Ratnieks, F.L.W.: Task Partitioning in Insect Societies. I. E!ectof colony size on queueing delay and colony ergonomic e"ciency. The Americannaturalist 154(5), 521–535 (1999)

4. Anderson, C., Ratnieks, F.L.W.: Task partitioning in insect societies: novel situa-tions. Insectes Sociaux 47, 198–199 (2000)

5. Brutschy, A.: Task allocation in swarm robotics. Towards a method for self-organized allocation to complex tasks. Tech. Rep. TR/IRIDIA/2009-007, IRIDIA,Universite Libre de Bruxelles, Brussels, Belgium (2009)

6. Brutschy, A., Pini, G., Baiboun, N., Decugniere, A., Birattari, M.: TheIRIDIA TAM: A device for task abstraction for the e-puck robot. Tech. Rep.TR/IRIDIA/2010-015, IRIDIA, Universite Libre de Bruxelles, Brussels, Belgium(2010)

7. Dorigo, M., Sahin, E.: Guest editorial. Special issue: Swarm robotics. AutonomousRobots 17(2-3), 111–113 (2004)

Page 112: Silvia task

298 M. Frison et al.

8. Fontan, M.S., Mataric, M.J.: A study of territoriality: The role of critical mass inadaptive task division. In: From Animals to Animats 4: Proceedings of the FourthInternational Conference of Simulation of Adaptive Behavior, pp. 553–561. MITPress, Cambridge (1996)

9. Hart, A.G., Anderson, C., Ratnieks, F.L.W.: Task partitioning in leafcutting ants.Acta Ethologica 5, 1–11 (2002)

10. Kant, E.: Perpetual Peace: A Philosophical Sketch. Hackett Publishing Company,Indianapolis (1795)

11. Lein, A., Vaughan, R.: Adaptive multi-robot bucket brigade foraging. In: Bullock,S., Noble, J., Watson, R., Bedau, M.A. (eds.) Artificial Life XI: Proceedings ofthe Eleventh International Conference on the Simulation and Synthesis of LivingSystems, pp. 337–342. MIT Press, Cambridge (2008)

12. Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat,S., Zu!erey, J.C., Floreano, D., Martinoli, A.: The e-puck, a robot designed foreducation in engineering. In: Proceedings of the 9th Conference on AutonomousRobot Systems and Competitions, Portugal, pp. 59–65, IPCB: Instituto Politecnicode Castelo Branco (2009)

13. Pinciroli, C.: Object retrieval by a swarm of ground based robots driven by aerialrobots. Tech. Rep. TR/IRIDIA/2007-025, IRIDIA, Universite Libre de Bruxelles,Brussels, Belgium (2007)

14. Pini, G., Brutschy, A., Birattari, M., Dorigo, M.: Interference reduction throughtask partitioning in a robotic swarm. In: Filipe, J., Andrade-Cetto, J., Ferrier, J.L.(eds.) Sixth International Conference on Informatics in Control, Automation andRobotics – ICINCO 2009, pp. 52–59. INSTICC Press, Setbal (2009)

15. Ratnieks, F.L.W., Anderson, C.: Task partitioning in insect societies. Insectes So-ciaux 46(2), 95–108 (1999)

16. Shell, D.J., Mataric, M.J.: On foraging strategies for large-scale multi-robot sys-tems. In: Proceedings of the 19th IEEE/RSJ International Conference on Intelli-gent Robots and Systems (IROS), pp. 2717–2723 (2006)

17. Traiano, B.: La Bilancia Politica di Tutte le Opere di Traiano Boccalini, Part 2-3.Kessinger Publishing, Whitefish, Montana, USA (1678)

18. Wagner, D., Brown, M.J.F., Broun, P., Cuevas, W., Moses, L.E., Chao, D.L., Gor-don, D.M.: Task-related di!erences in the cuticular hydrocarbon composition ofharvester ants. Pogonomyrmex barbatus. J. Chem. Ecol. 24, 2021–2037 (1998)