[IEEE 2013 3rd Joint Conference of AI & Robotics and 5th RoboCup Iran Open International Symposium...

978-1-4673-6315-0/13/$31.00 c⃝2013 IEEE

An Approach to Design a Robust SoftwareArchitecture and an Intelligent Model for

Multi-Agent Systems

Asadollah NorouziSchool of Electrical and Electronic Engineering

Singapore Polytechnicasadollah [email protected]

Carlos Antonio AcostaSchool of Electrical and Electronic Engineering

Singapore [email protected]

Abstract—A successful multi-agent system requires the in-telligent agents to perform within a dynamically complex en-vironment where proper and quick response in a cooperativemanner is a primary key to successfully complete a task. Thispaper proposes a non-deterministic decision making methodusing electric fields and high-level decision making. Differentlayers are designed, defined, and implemented for the softwarearchitecture with focus on system adaptability, sustainability, andoptimization. Consequently, a software architecture is proposedin this paper to complement the AI algorithms. The proposedarchitecture aims to provide a well-structured and managedsystem for control, behavior, and decision making of multi-agentsystems. The proposed decision making approach in this paper isbased on layered artificial intelligence implemented using vector-based fuzzy electric fields and a decision tree. Furthermore, anapproach to model the world which, in this paper, is called AgentRelative Polar Localization is introduced. This world model isbased on fuzzy measurements and polar coordinates. In order tooptimize the overall performance of the system learning methodshave been introduced to the system. The proposed system in thispaper has been implemented on soccer robots to evaluate theperformance of the system. The results show that the proposedsystem implemented on the soccer robots is reliable and robust.

I. INTRODUCTION

This aper is an extended version of our previously pub-lished work.[1] More details on the software architecture hasbeen added to this version while the discussion on the proposedAI algorithms and decision making will remain unchanged inthis paper. Furthermore, we have added the proposed system’sperformance results in this extended version.

The development of multi-agent systems is one of the ma-jor topics of today research. It is a science that simultaneouslyinvolves many areas of today engineering. In many availableresearch papers each of these branches are discussed sepa-rately, while it is the combination of both software engineeringand artificial intelligence that lead to the development of anoptimum and well performing multi-agent system. This paperis a part of the main author’s thesis work.[2]

A previously published outline for a theory of intelligencedefines intelligence as ”the ability to act appropriately inan uncertain environment, where appropriate action is thatwhich increases the probability of success, and success is theachievement of behavioral goals.”[3] That is, the intelligentsystem must be designed such that it increases the probability

of success and reduces the probability of failure. Thus, theachievement of the aforementioned goal requires at least thefollowing assets:

• A well-structured architecture design for the systemsoftware

• An efficient and reliable algorithm

• A well designed hardware (in case of an actual robotor autonomous machine involvement)

This paper focuses on the first two assets. The proposedarchitecture is inspired from the work of J. Albus on IntelligentSystems Design.[4]

The proposed layered software architecture in this paperis such that the agent is provided with different sets ofanalyzed data which are prepared by each individual layer. Thepreparation of data is such that it would enable the agent tomake decisions with higher probability of success. The layersare designed to be independent from other layers and therebybecome easily editable and upgradable without the constraintto make major changes to other layers. The issue of faulttolerance is also considered in the design to minimize systemfailure. However, due to the inevitabilityn of the occurrence ofunexpected events that may result in system failure, the systemis designed in a way that it would automatically switch to arecovery state in case of a failure.

The intelligence model proposed in this paper is based onelectric fields. Every object in the environment will be definedby an electric field. The electric fields of all objects triggerelectric forces represented by vectors where each vector isconsisted of two main characteristics: Direction and Magni-tude. The agent makes decision based on the combinationof the resultant vector of all electric forces vectors and theagent’s current status in its decision tree. These vectors arealso used for the agent’s navigation through the environment.The electric fields resultant vector is also used for the agentsnavigation path. The agent’s decision tree is dependent onthe application under which the agent will be performing.Therefore, its decision tree must be designed according to thetasks and goals that the agent must follow to complete itsdesignated mission.

The major principle of the World Model proposed in thispaper is to replace the numerical measurements and introduce

fuzzy definitions in the measurement system. That is, thevolume of calculations is considerably reduced in the proposedsystem. This is, in fact, somewhat similar to the humanmeasurement process where measurements are performed ina non-numerical way. The proposed world model in this paperis called Agent Relative Polar Localization; ARPL in short.This model provides the advantage of not requiring a preciseglobal positioning. That is, a relative polar localization is in-troduced to substitute the global positioning and consequentlyno complex algorithm is required for the agent localizationprocess. This, thereby, reduces calculation errors and enhancesthe system performance.

The multi-agent systems are rapidly developing both inhardware aspects and in software aspects. Therefore, thesystem must be able to adapt itself with new upgrades easilyand quickly with minimum costs imposed by the adaptationprocess. The system architecture plays a very important rolehere to achieve this objective[5].

The system reliability is a challenging issue when referringto multi-agent systems. The presence of errors malfunctionsand crashes are inevitable due to the dynamics of such en-vironments and continuous interaction of the agents with thesurrounding environment. Thereby, the system must be able torecover from errors and crashes that may occur in a part ofthe system while avoiding other parts of the system to crashas a result of this.

Finally, the system must be able to well perceive andunderstand its environment and make appropriate decisions tosuccessfully accomplish the system’s defined tasks and goals.Therefore, the decision making algorithms, world modeling,and other related areas must be well studied and implementedto meet the system objectives.

II. THE SYSTEM ARCHITECTURE

The agents in multi-agent environments are either as soft-ware agents or as hardware agents (robots). The main objectiveof the proposed architecture in this paper is to design anddevelop a system to address both software and hardwareagents. The advantage of having such architecture is thepossibility of having one program that can run on both softwareand hardware agents without re-designing and re-developingthe whole system. Obviously, the need to re-design and re-develop the whole system imposes extra costs and is alsotime consuming. It is, however, noteworthy that a completecompatibility between software and hardware agents may notbe possible in all applications, but the proposed architecturein this paper requires minimum changes to some layers forutilizing a transfer from software to hardware agents and viceversa. Furthermore there are always updates and upgrades forvarious sections of both hardware and software; therefore, thesystem is designed to be adjustable with new software and/orhardware without making major changes and modifications tothe program.

The proposed architecture is based on layers which areindependent and are responsible for certain tasks.[6] The layerscommunicate with one another using special communicationprotocols. In case of hardware and/or software upgrade, onlythe protocols can be modified to meet the new compatibilityrequirements and therefore the rest of the program can remain

Fig. 1. The illustration of the system architecture layers

unchanged. The idea of this architecture is inspired from theOSI (Open System Interconnection) model. The OSI modelwas introduced in July 1979 which is a layered architecturecomprising seven layers: Physical, Data Link, Network, Trans-port, Session, Presentation, and Application.[7]

The proposed system architecture in this paper comprisesfive collaborating layers (see figure 1) that have been designedto meet the aforementioned objectives. The layers are designedsuch that they remain independent while being in constantcollaboration and communication to other layers in order toreach the overall goal of the system. Five layers are proposedin our architecture which are: Gate Layer, Transfer Layer,Decision Layer, World Model Layer, and Predict Layer.

The main objective in the process of layers design andimplementation is to provide independency and flexibility. Thisis of importance given the fact that multi-agent environmentsare environments of which undergo many changes and are con-sidered very dynamic. Even though all layers are in connectionwith one another but the structure and detailed procedure ofeach layer is not an issue for other layers, in other words,the details have been abstracted and encapsulated. This designand implementation allows each layer to be updated or undergostructural changes without engaging changes and modificationsto other layers. In these cases only the protocols need to beupdated and modified.

The system fault tolerance has also been considered inthe design and implementation. There is a special section inthe system which is called Exception Handler. The ExceptionHandler is a parallel process which monitors the performanceof the system and detects the faults in the system. It triggersappropriate action in case of a fault within the system and triesto recover the system. If the problem persists and system failsto recover the Exception Handler executes a restart commandto force the entire system to reboot. In a case of reboot, ifthe memory of the system is not damaged then the systemwill move to its last recorded state before the fault. ThisException Handler can also receive fault reports from otherlayers. This feature is very useful especially when the layerscommunication is lost. The layers in some cases may detect

malfunctions in other layers and report them directly to theException Handler.

A. Gate Layer

Gate Layer is the only layer that is in contact with theworld and environment. All interactions between the agentand the environment go through this layer. The Gate Layerhas two principal tasks. The first task is to perceive andsense the environment through available input channels and/orsensors. These inputs include, but are not limited to, Vision,Ultrasonic Sensors, Proximity Sensors, Gyro, Encoders, etc.These input channels could either be hardware receiving in-formation from an actual environment or be software channelsreceiving information from a simulated or virtual environment.The other task of the Gate Layer is to convert the commandsand decisions made at the Decision Layer into executable andunderstandable commands for the output channels. The outputchannels include, but are not limited to, motors, actuators,etc. Similar to the input channels, the output channels couldeither be hardware or software. In case of hardware or softwareupgrade for the input and output channels, only this layer needsto be modified and updated in order to comply with the newupgrades while the other layers remain unchanged.

Gate Layer only communicates with the Transfer Layerwhere there is a mutual communication between these twolayers. The Gate Layer collects all the usable inputs and sendsthem to the Transfer Layer, and on the other hand the TransferLayer sends the decision commands to the Gate Layer forexecution.

B. Transfer Layer

Transfer Layer is mainly considered as a data parsingmedium. This layer receives the input data from Gate Layerand prepares them to be sent to Decision, Predict, and WorldModel Layers. The Transfer Layer analyzes and parses theinput data and converts them into the system’s defined standarddata types. There are inevitable errors and noises within thedata received from the environment, therefore, this layer filtersthe noise and corrects the data where necessary. The TransferLayer also receives the decision commands from the DecisionLayer and converts them to understandable commands for theGate Layer. The Transfer Layer is one of the most significantand vital layers of the system because the most essentialsections of the system which are world modeling and decisionmaking use the data prepared by this layer.

This layer plays an important role in recognizing the faultswithin the system by analyzing the data that it receives fromthe Gate Layer and the Decision Layer. The data that arebeing transferred in the system are good assets of identifyingprobable errors in the system.

In case of a software upgrade within the higher layers ofthe system, such as the Decision Layer, only the Transfer Layerrequires modification to recognize the new upgrades.

C. Decision Layer

The Decision Layer is actually the main intelligent sectionof the system which is consisted of two sub-layers namedLow-Level and High-Level. This division is due to the fact

that not all the decisions necessarily lead to a physical actionor reaction whereas there are decisions for higher levelsof intelligence[8]. Having said that, in the proposed systemthe decisions leading to a physical action are called low-level decisions and all other decisions are called high-leveldecisions.

The process of decision making at the Low-Level sub-layer is based on the inputs provided by the High-Level sub-layer and the World Model Layer. The types of the decisionsmade at the High-Level sub-layer are based on the applicationand environment under which the agent exists. An exampletype of a high-level decision that is shared by all multi-agentenvironments is the strategy of which the intelligent agentscooperatively conduct a task. The input from the World ModelLayer is the resultant vector of all the generated electric fieldvectors. The Low-Level sub-layer combines both inputs andprepares a final decision ready for execution. This decision isthen sent to the Transfer Layer and finally the Transfer Layersends this command to the Gate Layer for execution.

The High-Level Decision recognizes and analyzes its sur-rounding environment using data generated by the WorldModel Layer and also takes the elements of the environmentwhich can and may influence the actions of the agent intoconsideration and evaluation. The decision making process ofthe agent can be concisely broken down into the following:

• Make a High-Level decision using adequate informa-tion collected from the World Model Layer, the PredictLayer, and additional information or commands fromother active elements of the environment.

• The decision made at the first step is shared with theelectric field generator section of the World ModelLayer to generate the corresponding electric fieldsvectors according to the High-Level decisions of theagent.

• The final part of the decision making process is doneby the Low-Level sub-layer where the final decisionis made using the combination of the results of thefirst two steps.

D. World Model Layer

The World Model Layer is responsible to model the worldbased on the information that it receives from the TransferLayer. The world model that is proposed in this paper is suchthat every element of the world is modeled relative to the agentusing the proposed Agent Relative Polar Localization method.The details of this method are further discussed later in thepaper. The other major task of the World Model Layer is togenerate the electric fields vectors of all the world effectiveobjects in the decision making process.[6] The correspondingelectric field vector for each object is generated using thehereunder equation:

F = kq0q1r2

(1)

Where F is the electric force between the agent and anobject in the environment, q0 is the agent’s electric chargewhich is set by the Decision Layer and dynamically varies

to respond to the changing circumstances and conditions ofthe environment, q1 is the electric charge of an object in theenvironment which is also set by the Decision Layer and justas the agent’s charge is variable, r is the parametric distancebetween the agent and the object, k is the coulomb’s chargelaw constant equivalent to 1

4πε0. This value is set to one bydefault.

Generalizing the equation 1 to correspond to all the ele-ments of the environment, we deduce the following equationfor every object of the environment:

F⃗j = ki

n∑i=1

(q0qir2i

, θi) (2)

Where Θi is the angle of every object relative to the agent,n is the number of objects, j is the vector type identity number,F⃗j is the resultant electric field vector for the jth vector type, kiis a number that dynamically changes by the Decision Layer.This value is used to determine the agents coefficient of risklevel. The need to have such coefficient was realized duringthe system tests where bottlenecks occurred. This value can beused to let the agent become more flexible to the environment.The appropriate determination and use of this feature requiresfurther studies. The vector types are defined depending on theapplication and environment of which the agent exists.

For example in case of soccer robots, as a result of equation2 two vectors of F⃗1 and F⃗2 are calculated where the first onerepresents the move vector and the latter represents a possiblekick vector (for either passing or shooting). Finally F⃗T ischosen from either of the calculated vectors using equation3:

F⃗T = ψ(F⃗1, F⃗2) (3)

Where F⃗T is the resultant decision vector, ψ is a selectivefunction that the Decision Layer uses to determine the opti-mum vector from the available and calculated input vectors.

E. Predict Layer

The Predict Layer is the forecasting side of informationprocessing. The aim here is to derive information about howthe surrounding world will be like at some time t0 + εtin the future, for some εt > 0, by using data measuredup to and including time εt. The predicted world is quietuseful for making High-Level decisions, especially in caseof determining action strategies where prediction providesvaluable sources for the decision making process. Therefore,the Predict Layer sends the predicted states to the DecisionLayer for further processing.

This layer also plays an important role for error detec-tion and correction. The presence of errors is inevitable andcannot be avoided, however must be controlled and reduced.The detection and correction of the errors and consequentlyproviding the Decision Layer with more realistic informationabout the world, obviously, makes the decisions made by theDecision Layer more efficient. That is, this layer was designedto take control of this task. This layer receives information

about the surrounding world from the Transfer Layer and willapproximate the state of the world for n steps ahead. Thenumber of n can be set and changed in the Predict Layersettings depending on the application of which the agent isrunning. If the difference between the newly received infor-mation from the Transfer Layer and the predicted informationof the future world in the Predict Layer exceeds the differencefactor threshold defined in the layer’s settings, then PredictLayer sends a signal to the Exception Handler informing it thatthere might be a malfunction within one of the input channels.The Predict Layer replaces the invalid or incorrect data withan average value between the last valid information and thepredicted state. This average is not necessarily a correct valuebut it would be close to the actual value in the world that allowsthe agent to make an appropriate decision in case of erroneousor unavailable input data. In this case a noise factor will alsobe calculated and produced by the Predict Layer which will beused for future calculations. This noise factor can be retrievedby the World Model Layer in case needed.

III. AGENT RELATIVE POLAR LOCALIZATION

Agent Relative Polar Localization, ARPL in short, is amethod for modeling agent’s surrounding world based onpolar coordinates of r and Θ where r represents distance andΘ represents angle. In this method, the agent perceives thelocation of the world’s objects using polar coordinates relativeto its location. Thereby, each object will have a distance andangle relative to the agent which produces a polar positionvector. The collection of these polar position vectors will makethe agent’s world in a vector representation.[6]

A. Case Study

A case scenario will be explored to better describe thismethod. In this scenario there are a number of agents as soccerplayers who are playing soccer in a soccer pitch (see figure 2).In this scenario the main input channel of the agent is a camerathat is looking upwards to a hyperbolic mirror that providesomni-directional vision for the agent.

In this case there are two ways to express the values ofdistance between various objects in the environment (soccerpitch). The first would be the exact logarithmic value which isactually the logarithmic position of the object in the hyperbolicmirror. It should be noted that this position is not the exactmetric value of the object distance since it is not recalculatedusing the hyperbolic equation of the mirror. The latter onewhich is used by the Decision Layer is the linguistic fuzzyrepresentation of the distance. This is done by dividing theagent’s circular visible range area into several logarithmicsections defined by linguistic quantities like ”close”, ”near”,”far”, etc. In the example shown in figure 2 the position vectorof the ball would be as in equation 4, the position vector of theleft goal would be as in equation 5, the position vector of theplayer towards the bottom of the field would be as in equation6, and finally the vector position of the player towards the rightgoal would be as in equation 7.

rball = close, θball = 348◦ (4)rgoal = near, θgoal = 210◦ (5)rp1 = near, θp1 = 280◦ (6)rp2 = far, θp2 = 350◦ (7)

Fig. 2. The agents localization using the proposed ARPL method.

Fig. 3. The agents localization using the method that requires two flag pointsto calculate the position.

Where r is the linguistic quantity of distance between theagent and object, Θ is the angle in degrees between the agentand object.

The magnitude of the linguistic distance ranges are in-creased exponentially from the closest point (tangent point)of the agent to the defined far most point. Agent RelativePolar Localization method eliminates the need of having staticposition references in the surrounding environment that allowsthe agent to avoid using Cartesian calculations (see equations8 and 9). Instead, the agent can directly use the data providedby the vision which are distance and angle for every object.

x = r · cos(θ) (8)y = r · sin(θ) (9)

Where x is the corresponding Cartesian x-coordinate ofdistance r and angle Θ, and y is the corresponding Cartesiany-coordinate of distance r and angle Θ.

B. Experiments and Results

The ultimate privilege of using polar coordinates overthe Cartesian coordinates is the decrease of inevitable datacalculations errors. Furthermore, the agent is free from mod-eling the global positions of the environment objects whichreduces data processing and calculation. In other words, inthis method the world modeling process does not dependon the stationary points (flags), and thereby the localization

Fig. 4. The performance comparison chart of ARPL localization methodwith three traditional localization methods where flag points are used.

Fig. 5. The data processing comparison chart of ARPL localization methodwith the traditional absolute positioning method that uses exact values forlocalization.

process become more reliable compared to other positioningand localizing techniques where stationary points are usedfor localization and positioning (see figure 3 that depicts anexample of localization using two flag points to calculate theposition). One of the major drawbacks of such systems isthat the failure of locating one of the stationary points wouldsignificantly produce localization errors. The Agent RelativePolar Localization method has been compared with threetraditional positioning methods within the introduced scenario.The methods have been tested on three defined tasks for theagent to perform. The agent’s first task is to track and follow amoving ball within the soccer pitch. The agent’s second task isto move towards the goal while holding and dribbling the ball.The agent’s last task is to move to an exact position within thesoccer pitch. As can be seen in figure 4 the agent’s performanceis better for the first two tasks: ”ball tracking” and ”goto goal”while, on the other hand, the accuracy in ”goto position” isonly 60%. However, it should be noted that achieving highaccuracy in ”goto position” would not be possible even for ahuman soccer player without a GPS device! Thereby, it wouldbe satisfactory if the agent successfully positions itself withina location close to the desired location.

Another experiment was carried out to test the performanceof the agent’s image and data processing in two localizationmethods. The two methods used in this experiment are this

paper’s proposed method and a traditional localization methodusing three points for positioning. Figure 5 depicts betterperformance of image processing and data fusion using AgentRelative Polar Localization compared to a traditional methodof using three points for localization. The less number of visionframes per second using the traditional method is due to thefact that this and the similar methods require more calculationsto acquire satisfactory results. The nature of the Agent RelativePolar Localization allows the agent to have fewer calculationsof the data perceived from the environment and consequentlybeing able to achieve a higher rate of image processing andalso being able to process more usable data per processingtime. The usable data here refers to the perceived data fromthe environment which can be used in the world modeling anddecision making processes.

IV. LEARNING

Many researches on cooperative behavior suggest applyingthis behavior for a group of robots in a multi-agent systemusing learning methods, such as Q-Learning, ReinforcementLearning, Behavior Learning, etc. [9][10][11][12][13][14][15].This paper also proposes the Reinforcement learning for theproposed system. In General, the Markov Decision Process(MDP)[16] is proposed for the learning section of this work.The MDP is a 4-tuple (S,A, P.(., .), R.(., .)), where S is afinite set of states, A is a finite set of actions, P (s′|s, a) isa transition function, and R(s, a) is the immediate rewardreceived after the transition. Here, the agent takes an action(a ∈ A) at every state (s ∈ S). Then, the agent receives areward R(s, a) and reaches a new state s′. The new state s′is determined from the probability distribution P (s′|s, a). Thevalue Q∗(s, a) of a given state-action pair (s,a) is determinedby solving the Bellman equation[17]:

Q∗(s, a) = R(s, a) + γ∑s′

P (s′|s, a)maxa′

Q∗(s′, a′) (10)

where γ is the discount rate and satisfies 0 < γ < 1. Theoptimal value function Q∗ can be found through value iterationby iterating over the Bellman equations until convergence[18].Having had the state transition function P and the rewardfunction R, the policy of which maximizes the expecteddiscount reward can be calculated. Thereby, the optimal policyπ is as follows[17]:

π(s) = argmaxaQ∗(s, a) (11)

V. CONCLUSION

A layered architecture was introduced which was inspiredby the OSI model. Thereby, each layer is independent and theconnection between layers is through specific protocols.

The proposed intelligent algorithm in this paper has fourmain sections: The electric fields vectors, the world model, thedecision making process, and the learning. The results of thesesections have been discussed but it must also be mentioned thatthe combination of these methods have significantly reducedthe amount of programming code volume and have alsoreduced the execution time making the system performance

Fig. 6. The performance of the system during totally 330 minutes of runtime

Fig. 7. Evaluation of the decisions made by the system

faster. For instance, in the proposed method where vectors arebeing used there is no need for additional programming codefor dribbling, obstacle avoidance, and/or other similar actions.The resultant vector of the electric fields will lead the agentto perform such tasks in different situations.

The system was tested during the RoboCup 2010 MiddleSize League competitions. The testing was under tensed com-petitions circumstances. The robots that were run by this sys-tem were very reliable and robust throughout the competitions.There was only one major failure in the system which wascaused by the power source failure, where the whole systemwas shutdown due to battery damage. There were, of course,errors and system exceptions during the runtime but all of themwere successfully handled except for the one mentioned earlier.In overall, Figure 6 illustrates the system performance in termsof errors and exceptions throughout totally 330 minutes ofruntime.

It cannot be claimed that the intelligence algorithms were100% successful because there were cases where better de-cisions could have been made. Even though the intelligenceperformance of the system has not been of the most optimizedone, but it has proven to be highly successful, reliable, andintelligent. Figure 7 shows the system performance in termsof decisions made in the 11 matches during the competitions.Totally 55 random decisions were selected for this evaluation.

It is strongly recommended for the future studies to further

develop the learning section. The studies discussed in this pa-per would provide a good start point for further developmentsof such multi-agent systems.

ACKNOWLEDGMENT

The authors would like to thank S.M. MohammadzadehZiabary and H. Khandan for their major contribution to this re-search. The authors also wish to thank the Advanced Roboticsand Intelligent Control Center of Singapore Polytechnic forproviding the facilities to implement the presented ideas inthis paper.

REFERENCES

[1] A. Norouzi, C. Acosta, and C. Zhou, “An approach to design a robustand intelligent multi-agent system,” in World Automation Congress(WAC). IEEE, June 2012.

[2] A. Norouzi, “Multi-agent systems - an approach to design and imple-ment applicable software architecture, world modeling, and decisionmaking,” Master’s thesis, Chalmers University of Technology, 2011.

[3] J. Albus, “Outline for a theory of intelligence,” IEEE Trans. on Systems,Man, and Cybernetics, vol. 21, no. 3.

[4] J. S. Albus, “A reference model architecture for intelligent systemsdesign,” in An Introduction to Intelligent and Autonomous Control.Kluwer Academic Publishers, 1993, pp. 27–56.

[5] T. Laue, “A behavior architecture for autonomous mobile robots basedon potential fields,” in International RoboCup Symposium.

[6] A. Norouzi, S. Mohammadzadeh, M. Mousakhani, and S. Shoaei,“Semi-human instinctive artificial intelligence (shiai),” in Proceedingsof the 2006 International Symposium on Practical Cognitive Agents andRobots. ACM, November 2006.

[7] H. Zimmermann, “Osi reference model-the iso model of architecture foropen systems interconnection,” IEEE transactions on communications,vol. com-28, no. 4.

[8] J. Mayer, “Decision-making and tactical behavior with potential fields,”in Springer.

[9] S. Buck, M. Beetz, and T. Schmitt, “M-rose: A multi robot sim-ulation environment for learning cooperative behavior,” DistributedAutonomous Robotic Systems, vol. 4.

[10] M. Asada, E. Uchide, and K. Hosoda, “Cooperative behavior acquisitionfor mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development,” Elsevier ArtificialIntelligence journal.

[11] N. Kubota and N. Aizawa, “Intelligent cooperative behavior controlof multiple partner robots,” IEEE/RSJ International Conference onIntelligent Robots and Systems.

[12] I. Noda, H. Matsubara, and K. Hiraki, “Learning cooperative behaviorin multiagent environment,” in 4th Pacific Rim International Conferenceon Artificial Intelligence.

[13] D. Lee, S. Seo, and K. Sim, “Online evolution for cooperative behaviorin group robot systems,” International journal of Control, Automation,and Systems, vol. 6.

[14] P. F. M.A. Dominey and E. Yoshida, “Real-time cooperative behavioracquisition by a humanoid apprentice,” IEEE/RAS International Con-ference on Humanoid Robotics.

[15] T. Hester, M. Quinlan, and P. Stone, “Generalized model learningfor reinforcement learning on a humanoid robot,” IEEE InternationalConference on Robotics and Automation.

[16] A. D’Angelo, E. Menegatti, and E. Pagello, “How a cooperativebehavior can emerge from a robot team,” in Springer.

[17] S. Haykin, Neural Networks - A Comprehensive Foundation. PrenticeHall.

[18] R. Sotton and A. Barto, Reinforcement Learning: An Introduction. MITPress.

[IEEE 2013 3rd Joint Conference of AI & Robotics and 5th RoboCup Iran Open International Symposium...

Documents

Transcript of [IEEE 2013 3rd Joint Conference of AI & Robotics and 5th RoboCup Iran Open International Symposium...