MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our...

7
MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK ENVIRONMENT Bojan Nemec, Leon ˇ Zlajpah Joˇ zef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia [email protected](Bojan Nemec) Abstract The paper describes our environment for off-line programming and control design of robot systems developed in Matlab/Simulink environment. A special empha- sis has been given on robot learning. Nowadays, it is commonly accepted that preprogrammed robots are applicable only in highly structured environments. In order to bring robotic technology in every-days life as well to be used for small batches of unstructured production, it is required that robots poses certain level of self-adaptation. One of the important aspect of the self adaptation is unsuper- vised learning, which imitates a learning processes of animals and human beings. Learning is a long process, based on many successful or unsuccessful repetitions of a given task. For that a judgment on how successful was the previous attempt is crucial. Usually, learning assures a convergence to the globally optimal solu- tion. Since successful learning depends on many parameters, like given rewards, learning speed, noise rejection, etc. the simulation becomes an essential tool for designing and tuning the learning algorithms. In the paper, we describe our sim- ulation models for learning the ball-in-a-cup game, which is often used as a test bed for various learning algorithms. Keywords: robot simulation, robot learning, robot programming. Presenting Author’s Biography Nemec Bojan is senior research associate at Dept. of Automatics, Biocy- bernetics and Robotics, Jozef Stefan Institute. He received BS, MSc and PhD degree from the Univerity of Ljubljana in 1979, 1982 and 1988 re- spectively. In 1993 he spent his sabbatical leave at the Institute for Real- Time Computer Systems and Robotics, University of Karlsruhe. His re- search interests include robot control, robot simulation, sensor guided con- trol, service robots and biomechanical measurements in sport. Between 2002 and 2005 he was a task leader in the largest NAS European project EUROShoE. He has published over 100 conference and journal papers and is author of 1 patent, and co-author of a book.

Transcript of MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our...

Page 1: MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our MATLAB/Simulink based simulation environment, dedicated to effective design of high an low level

MODELING OF ROBOT LEARNING INMATLAB/SIMULINK ENVIRONMENT

Bojan Nemec, Leon Zlajpah

Jozef Stefan InstituteJamova 39, 1000 Ljubljana, Slovenia

[email protected](Bojan Nemec)

Abstract

The paper describes our environment for off-line programming and control designof robot systems developed in Matlab/Simulink environment. A special empha-sis has been given on robot learning. Nowadays, it is commonly accepted thatpreprogrammed robots are applicable only in highly structured environments. Inorder to bring robotic technology in every-days life as well to be used for smallbatches of unstructured production, it is required that robots poses certain levelof self-adaptation. One of the important aspect of the self adaptation is unsuper-vised learning, which imitates a learning processes of animals and human beings.Learning is a long process, based on many successful or unsuccessful repetitionsof a given task. For that a judgment on how successful was the previous attemptis crucial. Usually, learning assures a convergence to the globally optimal solu-tion. Since successful learning depends on many parameters, like given rewards,learning speed, noise rejection, etc. the simulation becomes an essential tool fordesigning and tuning the learning algorithms. In the paper, we describe our sim-ulation models for learning the ball-in-a-cup game, which is often used as a testbed for various learning algorithms.

Keywords: robot simulation, robot learning, robot programming.

Presenting Author’s BiographyNemec Bojan is senior research associate at Dept. of Automatics, Biocy-bernetics and Robotics, Jozef Stefan Institute. He received BS, MSc andPhD degree from the Univerity of Ljubljana in 1979, 1982 and 1988 re-spectively. In 1993 he spent his sabbatical leave at the Institute for Real-Time Computer Systems and Robotics, University of Karlsruhe. His re-search interests include robot control, robot simulation, sensor guided con-trol, service robots and biomechanical measurements in sport. Between2002 and 2005 he was a task leader in the largest NAS European projectEUROShoE. He has published over 100 conference and journal papers andis author of 1 patent, and co-author of a book.

Page 2: MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our MATLAB/Simulink based simulation environment, dedicated to effective design of high an low level

1 IntroductionNowadays, the system designer can relay on a varietyof different software tools which can significantly in-crease his efficiency. Among them, the simulation hasbeen recognized as an important tool in designing newproducts, investigating their performances and also indeveloping applications of these products. For complexsystems such as robots the simulation tools can cer-tainly enhance the design, development, and even theoperation of the robotic systems. Augmenting the sim-ulation with visualization tools and interfaces, one cansimulate the operation of the robotic systems in a veryrealistic way.

The simulation tools for robotic systems can be dividedinto two major groups: tools based on general simula-tion systems and special tools for robot systems. Toolsbased on general simulation systems are usually spe-cial modules, libraries or user interfaces which simplifythe building of robot systems and environments withinthese general simulation systems. Special simulationtools for robots cover one or more tasks in robotics likeoff-line programming, design of robot work cells, kine-matic and dynamic analysis, mechanical design. Theycan be specialized for special types of robots like mo-bile robots, underwater robots, parallel mechanisms, orthey are assigned to predefined robot family. Depend-ing on the particular application different structural at-tributes and functional parameters have to be modeled.The majority of the robot simulation tools focus on themotion of the robotic manipulator in different environ-ments and among them an important group are the toolsfor the design of robot control systems.

Currently, many different simulation tools are available,which could be used in research and teaching labora-tories. However, these tools are not always fulfillingall the requirements of the research and teaching ac-tivities in robotic laboratories. Reconfigurability andopenness are features already recognized by many asessential in the development of advanced robot controlalgorithms. Not only is it important to have easy accessto the system at all levels (e.g. from high-level supervi-sory control all the way down to fast servo loops at thelowest level), but it is a necessity to have open controlarchitectures where software modules can be modifiedand exteroceptive sensors like force/torque sensors andvision systems can be easily integrated. Reconfigura-bility should also be reflected when more fundamentalchanges to the controller architecture are required, inthe necessity of quickly being able to make modifica-tions in the original design and verify the effect of thesemodifications on the system. In other words, the usershould be able to quickly modify the structure of thecontrol without having to alter the simulation systemitself.

One of the widely used general simulation platformsfor the modeling and simulation of various kind of sys-tems MATLAB. Developed by MathWorks, MATLABallows matrix manipulations, plotting of functions anddata, implementation of algorithms, creation of userinterfaces, symbolic computation and interfacing with

programs written in other languages. A number of ad-ditional toolboxes efficiently solves problems regard-ing technical computing, embedded systems, controlsystems, digital signal processing, communication sys-tems, image and video processing, mechatronics, com-putational biology, computational finance and others.Therefore, it is not surprising that it has been used inten-sively for the simulation of robot systems. One of suchtools, The Robotics Toolbox [1], provides many func-tions that are required in robotics and addresses areassuch as kinematics, dynamics, and trajectory genera-tion. The Toolbox is useful for simulation as well as foranalyzing the results from experiments with real robots,and can be a powerful tool for education. However,it is not integrated into Simulink environment and notappropriate for and for the hardware-in-the-loop sim-ulation. However, it is not very good for the simula-tion in Simulink and for the hardware-in-the-loop sim-ulation “SimMechanics Toolbox” [2] extends Simulinkwith the tools for modeling and simulating mechani-cal systems. With SimMechanics, one can model andsimulate mechanical systems with a suite of tools tospecify bodies and their mass properties, their possiblemotions, kinematic constraints, and coordinate systemsand to initiate and measure body motions.

In the paper, we present our MATLAB/Simulink basedsimulation environment, dedicated to effective designof high an low level control algorithms, sensory interac-tion and testing of the hardware in the loop. One of themajor benefits of the proposed approach is, that allowsto include the real hardware at any point of the sim-ulation loop. Therefore, is is possible to switch fromthe simulation to the real hardware experiment with-out any modification of the program code. This is ex-tremely important when testing new control algorithmsand strategies for solving complex tasks, usually relatedto service and humanoid robotics. These tasks usuallyrequire that robots poses learning capabilities and cer-tain level of self-adaptation. Nowadays robot learninghas been intensively investigated as a promising wayto execute complex tasks arising from the needs of hu-manoids and service robotics. In the past, a number ofalgorithms for the robot learning were proposed, start-ing from the supervised learning based on neural net-works, support vector machines, to the unsupervisedlearning based on reinforcement learning, genetic al-gorithms, etc.. Common to all these approaches is thatthey require relatively large number of repetitions in or-der to learn the required skill. Especially when devel-oping new algorithms or just when tuning the learningparameters, this becomes very time consuming and te-dious procedure especially if executed on the real sys-tem. Therefore, the simulation is used for learning asa rule. However, if we want to use skills learned onsimulation also for the target system we need precisemodels. Our approach is presented through an exampleof learning a robot to play the ball-in-a-cup game, alsoknown as the “kendama” game. The aim of this game isto swing the ball hanging from a cup down on a stringand to catch it with the cup.

Page 3: MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our MATLAB/Simulink based simulation environment, dedicated to effective design of high an low level

2 Simulation environmentIn the last years, the scope of our research has beenoriented more in the development of control systemsfor humanoid and service robots. These robots havein general a complex mechanical structure with manydegrees-of-freedom. Consequently, complex kinematicand dynamic models are necessary to simulate them ad-equately. Furthermore, the control methods and algo-rithms we are developing are usually a part of the higherrobot control levels and it is assumed that the low levelclose-loop control algorithms are a solved issue. Thesehigh level control algorithms can become very complexand may even require parallel computation distributedover more computers.

Therefore, we had to reconsider the concept of the con-trol design environment we will use in future. We haveaugmented our simulation environment with compo-nents that

• simulate the kinematics and dynamics of an arbi-trary chosen kinematic chain describing differentmanipulators• enable integration of different sensor systems like

vision and force sensors• enable simulation of scenarios for complex robot

tasks• model the robots’ environments• visualize the robots and their environment• enable integration of real robots in the simulation

loop

Based on our good experience with MAT-LAB/Simulink we have decided that MAT-LAB/Simulink will be the kernel of our simulationtools. However, some of the above requirements can beeasier fulfilled by using other tools. For example, thevisualization of the robot and the environment can beeasily done by dedicated graphics tools. Furthermore,advanced robot control strategies rely intensively onfeedback sensor information. The very complex sensorsystem is the vision system, which can have severalconfigurations and can be implemented on a singlecomputer or on a computer cluster composed of manycomputers running different operating systems. Tointegrate such a diversity of hardware components in anunique framework we have decided to use the ethernetcommunication and the UDP protocol. In this way, wehave a maximal possible ”degree-of-openness” of thesystem. In our environment, each block can representa real system or a model of that system. Note thatbecause we are using ethernet communication betweenthe blocks, different software tools on different plat-forms can be used to simulate specific parts of thesystem. Consequently, the simulation environmentcan consist of several interacting applications, eachrepresenting a part of the system.

In Simulink, a system is modeled by combining input-output blocks. To gain the transparency we try to rep-resent a system by the block structure with several hier-archical levels, i.e. by combining different basic blocks

subsystems are built which become a single block at thehigher level. Figure 1 shows the Robot systems block li-brary. The goal of the library is to provide blocks whichare needed to simulate robotic systems and can not bemodeled with standard blocks. First of all, this are theblocks for robot kinematic and dynamic models, theblocks for sensors systems, the typical transformationspresent in robot systems and the special interface blocksfor robots, sensors and all other communications. Addi-tionally, the library includes some blocks with standardsubsystems like task space controllers, trajectory gen-eration modules, etc.

3 Modeling of the ball-in-a-cup gameWhen learning different tasks different models areneeded. Unlike when learning a tennis swing, wherea human needs just to learn the goal function for thevery moment when the racket hits the ball, the Ball-in-a-Cup task where the ball and the cup constantly inter-act relies on the complete dynamics of the cup and theball. In order to build a simulation model, we first needan accurate dynamic model of both parts. The part, i.e.the subsystems of the ball-in-a-cup game, were mod-eled using the Open Dynamics Engine (ODE) package[3]. With ODE, we can successfully simulate the in-teraction (collisions) between the ball and a cup, bothmodeled as rigid bodies. Unfortunately, ODE has nocapability of simulation the ball on a rope. Therefore,we had to build a special model for this purpose. Let usbriefly present the dynamic model of the ball-on-a ropesystem.

Ball on a rope acts as a pendulum as long as the ropeis not loose and as a free flying object otherwise. Bothmodels are well known, but the problem is the switch-ing between both models. It turns out, that is more ap-propriate and accurate to model the ball as a free flyingobject with external forces acting on it. External forcesare air drag forces and the forces resulting from the ropetension. The overall dynamical model can be describedby the set of equations

mz = fr + fa (1)

fr =

{(Ksd+Kdd)h, d > 0

0, d ≤ 0

fa = Kaz

d = (l − l0),

where z is the vector of the ball position, l is actual ropelength, l0 is the length of the untended rope, h is the unitvector describing the tent rope direction, Ka is the airdrag constant and Ks and Kd are the rope stiffens andrope damping respectively.

Since the ball-in-a-rope game is very sensitive to the ex-act trajectory guidance over the entire game phase, weneed an accurate dynamic model of the robot. The robotcould also be modeled using ODE, but it turns out thatthe simulation model is far more accurate, numericallystable and efficient if modeled explicitly. The dynamicsof the n degrees of freedom articulated robot arm can

Page 4: MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our MATLAB/Simulink based simulation environment, dedicated to effective design of high an low level

(c) 2008 IJSInstitut Jozef Stefan

SloveniaRobot system library 1.0

Transformations

Transformations

Task Generation

TaskDefinition

Special devices

Special DevicesEnvironment

Sensors

SensorSystems

Rotations

Rotations

Quaternion operations

Quaternionoperations

Nomad

NomadXR 4000

Mitsubishi PA 10

Mitsubishi PA 10Robot

Head

Humanoid RobotHead

HOAP 3

Fujitsu HOAP 3Robot

Controllers

RobotControllers

Animation

RoboworksAnimation

Fig. 1 Simulink Robot systems library

be presented with the well know equation

q =H−1(τ −Cq − ξ − g − JTF ), (2)

where q is n dimensional vector of joint angles, τ is ndimensional vector of joint torques,H is an n×n sym-metric, positive definite inertia matrix , C is n× n ma-trix of nonlinear terms due to the centrifugal and Cori-olis forces, ξ is n dimensional vector of friction forces,g is n dimensional vector of gravitational forces, J isn×m dimensional Jacobian matrix,m is the number ofthe task coordinates and F is an m dimensional vectorof environment contact force acting on the end-effector.Nowadays the calculation of matrices H , C and vec-tor g is becoming easy thanks to a number of dedi-cated program packages for efficient dynamics calcula-tion such as SD-Fast [4], Modelica [5], Newton [6], etc.Many of them return not only the efficient mathematicalmodel, but also the program code. On contrary, a cor-rect friction model remains still a problem. Namely, thefriction differs from the robot to robot of the same typeand also changes with the time. Therefore, the frictionshould be estimated and compensated with an appropri-ate algorithm. In our lab we have developed a frictionestimation and compensation algorithm based on twolayer neural network [7].

An important feature of the simulation is also the vi-sualization. It is very important to visualize the sim-ulation results. Especially in robotics it is necessaryto “see” the motion of the robot and objects in theworking environment. In our system we relay on ex-ternal software for the visualization and animation ofrobots. In general, joint angles of robotic manipula-tors as well as the position and orientation of the othersimulated objects in the scene are passed to the visual-ization tools using TCP/IP or UDP protocol. Currently,we have integrated into our simulation environment twovisualization software packages - RoboWorks [8] andBlender[9]. Roboworks incorporates simple, but effi-cient modeler. Because of its simplicity RoboWorkspackage is the favorable tool for the visualization ofsimpler systems, i.e. one or two robots in non-complexenvironment. Figure 2 shows Roboworks model of therobot arm Mitsubishi Pa10 mounted on mobile platformNomad XR4000 during the simulation of the ball-in-a-cup game. For more complex scenes we use Blender,an open source multi-platform 3D computer animation

program, which has a lot of features that are potentiallyinteresting for engineering purposes, such as the simu-lation and programming of robots, machine tools, hu-mans and animals, etc. and the visualization and post-processing of all sorts of data that come out of suchbiological or artificial devices. Blender supports alsoscripts (via Python interfaces to the core C/C++ code)and has the capability of placing moving cameras at anylink of the kinematic chain, it supports the real timephoto realistic rendering for the virtual reality simula-tion and has also a physics engine for the simulation ofthe interactions between entities.

Fig. 2 Roboworks model of the ball-in-a-cup game

One of the key issues in the ball-in-a-cup game is alsothe prediction of the catching pose. The prediction isbased on the estimation of the ball trajectory. Basedon the dynamic model of the free-flying object, we cancompute the ball trajectory using past observations andapplying appropriate regression algorithm. A straight-forward way would be to simply use the data from theball-on-a-rope simulation block, but this violates ourconcept of transparency and interchangeability of thesimulation blocks and the real-hardware blocks. There-fore, we have developed a special module for the cam-

Page 5: MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our MATLAB/Simulink based simulation environment, dedicated to effective design of high an low level

era simulation. It captures images directly form thecomputer screen and is thus independent from the vi-sual representation block. It acts as a virtual camera. Inthis block, we can simulate also different lighting con-ditions. Appropriate image processing algorithms fromour library are then used in order to retrieve the requiredinformation from the real or virtual camera.

In order to define the initial trajectory for the swing-upphase of the ball-in-a-cup game, we need the simula-tion of the kinesthetic guidance. The simulation of thekinesthetic guidance is provided by using the haptic de-vice, which controls the position and the velocity of thevirtual hand holding the ball-on-a-rope system and de-livers back the resulting forces. In this way it mimicsthe real kinesthetic feeling of playing a real ball-on-a-rope game. The figure 3 displays the simulation setupof the ball-in-a-rope game.

Fig. 3 Simulation of the kinesthetic guidance of theball-in-a-cup game

4 Simulation of the supervised learning

Generally, supervised learning is a machine learningtechnique for deducing trajectories from a set of train-ing data. In robotics, this kind of learning is often re-ferred as learning by imitation. The imitation involvesthe interaction of perception, memory, and motor con-trol. A straightforward approach is to mimic the humanmotion by recording a sequence of movements and toreproduce the same motion with the robot. In general,this approach fails due to different dynamics capabili-ties of the robot and humans. Therefore, the trajectoryto be executed by the robot has to be modified appro-priately. A central issue in the trajectory generation andmodification is the choice of the representation (or en-coding) of the trajectory. One of the most suitable rep-resentations, which facilitates adaptation, are the Dy-namic Motion Primitives (DMP) [10]. In the standardDMP formulation, the motion in each task coordinate isrepresented as a damped mass-spring system perturbedby an external force. Such a system can be modeled

with a set of differential equations

v =1

τ(K(g − y)−Dv + f(x)) , (3)

y =v

τ

where v and y are the velocity and position of the sys-tem, x is the phase variable, which defines the time evo-lution of the trajectory, τ is the temporal scaling factor,K is the spring constant, and D is the damping. Thephase variable x is defined by

x = −αxτ. (4)

For the trajectory generation it is necessary that the dy-namic system is critically damped and thus reaches thegoal position without overshoots. A suitable choice isD = 2

√K, where K is chosen to meet the desired ve-

locity response of the system. Function f(x) is a non-linear function which is used to adapt the response ofthe dynamic system to an arbitrary complex movement.A suitable choice for f(x) was proposed by Ijspeert etal. [11] in the form of a linear combination of M radialbasis functions

f(x) =

∑Mj=1 wjψj(x)∑Mj=1 ψj(x)

x, (5)

where ψj are Gaussian functions defined as ψj(x) =exp(− 1

2σ2j(x − cj)2). Parameters cj and σj define the

center and the width of the j-th basis function, whilewj are the adjustable weights used to obtain the desiredshape of the trajectory. A suitable method for deter-mining weights wi from the demonstration trajectory islocally weighted least square regression [11].

First, in order to obtain an example for the imitationwe have recorded the motions of a human player by thekinesthetic teach-in. In the simulation environment, wehave used the haptic device “Phantom Omni”, whichcontrols the model of the ball-on-rope. In another setup,a haptic device might be substituted by a real time mo-tion capture system ’OptoTrack’. In our integrated en-vironment this can be done by simply substituting themodel block. This makes the system transparent regard-ing the use of different hardware. As previously stated,the captured trajectory has to be adjusted according tothe robot dynamics. For that, we have first encoded aset of training trajectories into DMP formulation. Eachtrajectory has been encoded using 22 parameter, goalg, duration τ and 20 weights wi of radial basis func-tions (M = 20). The resulting parameters were av-eraged over entire set of the trajectories. It turned outthat we can adapt the captured trajectory to the simu-lated robot dynamics by tuning just few of parameters,among which the temporal scaling τ was dominant.

5 Simulation of the reinforcement learn-ing

In this section we present methods and tools for learn-ing of swing-up motion for the ball-in-a-cup without

Page 6: MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our MATLAB/Simulink based simulation environment, dedicated to effective design of high an low level

Fig. 4 Cup trajectory during the ball-in-a-cup game ob-tained with imitation learning

any previous knowledge of the system or the environ-ment. We decided to evaluate SARSA-learning algo-rithm for this problem [12]. All previous studies usedthe function approximation reinforcement learning al-gorithms [13, 14], where the goal of the learning pro-cess was to tune the parameters of the swing-up trajec-tory. In most cases the learning process starts with apre-learned trajectory, e.g. trajectory obtained from thekinesthetic guidance. On contrary, SARSA algorithmcan be applied without any presumption of the envi-ronment model and has to implicitly learn the systemdynamics trough exploration. During the exploration,the system collect rewards and tunes the control policybased solely on the accumulated rewards. The modi-fied algorithm for the SARSA learning with eligibilitytraces [12] is described by the following Eq.

Q(st, at)← Q(st, at)+ (6)α[rt + γQ(st+1, at+1)−Q(st, at)]e(s)

et(s) =

{γλet−1(s) if s 6= st1 if s = st

, (7)

where Q is the is the action-value function and st andat are the state and the action of the agent at time t,respectively. α and γ denote the learning rate and dis-count rate; these two parameters determine the learningrate and the convergence, respectively. rt denotes thereward associated with the agent state, obtained in thecurrent time t. The Eq. 7 describes the exponential de-cay of the eligibility traces. Each time the agent is in astate st, the trace for that state is reset to 1. Eligibilitytraces help the agent to enhance the learning speed ongood actions. One of the key properties of the learningsystem is how we assign states s and actions a. Bestresults were obtained by selecting states composed ofthe cup position x, cup velocity x, angle between thecup and the ball ϕ and angular velocity ϕ. For SARSAlearning, states have to be discretized. The cup positionhas been described with two values, cup velocity withthree values, ball angle with 18 values spanning overthe entire circle and the ball angular velocity with 3 val-ues, forming all together a set of 324 values. The action

value has been chosen to be the cup acceleration x de-scribed with 5 values spanning from the maximal neg-ative to the maximal positive acceleration. Commonly,ε-greedy method is used for the learning. In our casewe have obtained a much faster learning with a randomset of harmonic trajectories used as input trajectories inthe entire rollout. The learning goal was to swing theball to the desired angle ϕ(t0) with the desired angu-lar velocity ϕ(t0). The behavior of the learning systemis determined mostly by the reward assigned to currentstate. The final state was rewarded with a positive re-ward.

One of the major problem of swing-up learning is thelimited displacement of the cup in x direction due to thelimited workspace of the robot. To prevent the robotfrom reaching its limits, the excessive displacement xhas been penalized by a negative reward. The learn-

Fig. 5 Reward during swing-up learning

Fig. 6 Learned ball and swing-up trajectory

ing requires many trails in order to learn the appropri-ate policy, which makes the simulation almost indis-pensable tool for robot learning. In average, 220 to 300rollouts have been required to learn the swing-up trajec-tory. During the learning, the system implicitly buildsmodels of the system kinematics, dynamics and limita-tions. In real environment, learning has to be continuedin order to match the dynamics of the real robot, usingthe final value of the action value matrix Q, learned inthe simulation, as an initial action value matrix in thereal robot. Figure 5 shows one example of the obtained

Page 7: MODELING OF ROBOT LEARNING IN MATLAB/SIMULINK …€¦ · In the paper, we present our MATLAB/Simulink based simulation environment, dedicated to effective design of high an low level

rewards during the learning. As we can see from theFig. 5, the learning ends after approx. 260 rollouts, therobot repeats the learned trajectory and gets a constantreward. Figures 6 and 2 show the ball trajectory, thelearned robot trajectory and the simulated robot poseafter the successful learning, respectively. Note that therobot does not necessarily learns equal swing-up trajec-tory if we repeat leaning procedure again, because aninfinite number of trajectories can swing-up the ball tothe desired height. Note also that the learned trajectorydifferent from the human demonstrated trajectory (Fig4). Human demonstrator has used only one swing toswing-up the ball and the swing-up trajectory is in thex − y plane. In contrast to that, the robot has been in-structed to learn the swing up trajectory by moving xcoordinate only. Due to the restricted accelerations, thelearned trajectory has required two swings in order toswing-up the ball to the desired height.

6 ConclusionsThe concept of the presented simulation environment isa result of our experience using the robots in researchand education. It has proved that our environment is avery useful and effective tool for fast and safe devel-opment and testing of advanced control schemes andtask planning algorithms, including force control andvisual feedback, as well as learning algorithms, whichare more and more required in contemporary serviceand humanoid robotics. The main part is implementedin MATLAB/Simulink and we have developed modelsfor the robots and sensors used in our laboratory. Tointegrate the variety of components in an unique frame-work we have decided to allow the use of different toolsfor their simulation. So, the simulation environmentcan be composed of more than one application and theethernet is used for the communication between them.In this way, our environment is very open and can bevery easily extended and adapted to different require-ments and applied to any types of robotic manipulators.One of the most important features of our simulationenvironment is that the testing on real robots is madevery easy; the real systems is simply replaced in thesimulation loop by proper interface blocks. For thatpurpose, we have developed interfaces for the robotsand sensors. The presented control design environmenthas proved to be a very useful and effective tool for fastand safe development and testing of advanced controlschemes and task planning algorithms, including forcecontrol and visual feedback. The software can be veryeasily extended and adapted to different requirementsand applied to any types of robotic manipulators. Lastbut not least, it is an efficient tool for educational pur-poses.

7 References[1] P. I. Corke. A Robotics Toolbox for MATLAB.

IEEE Robotics & Automation Magazine, 3(1):24– 32, 1996.

[2] The Mathworks. SimMechanics, User’s Guide,2005.

[3] Open Dynamics Engine, http://ode.org/ode.html.[4] Symblic Dynamics, Inc. SD/FAST User’s Manual,

1994.[5] Modelica, http://www.modelica.org.[6] Newton Game Dynamics,

http://physicsengine.com/.[7] L. Zlajpah T. Petric. Application of neural net-

works for dynamic modeling of robotic mech-anisms. In Proceedings of the 7th EUROSIMCongress on Modeling and Simulation, Prague,Czech Republic, 2010.

[8] RoboWorksTM: http://www.newtonium.com/publichtml/Products/RoboWorks/RoboWorks.htm.

[9] Blender: http://www.blender.org/.[10] Stefan Schaal, Peyman Mohajerian, and Auke

Ijspeert. Dynamics systems vs. optimal control– a unifying view. Progress in Brain Research,165(6):425–445, 2007.

[11] A.J. Ijspeert, J. Nakanishi, and S. Shaal. Learn-ing Rhythmic Movements by Demonstration us-ing Nonlinear Oscilators. In Proc. of the 2002IEEE/RSJ Int. Conf. On Intelligent Robots andSystems, pages 958 – 963, Lausanne, Suisse,2002.

[12] R. Sutton and A. Barto. Reinforcement Learning:An Introduction. Cambridge, MA: MIT Press,1998.

[13] K. Doya. Reinforcement learning in continuoustime and space. Neural Computation, 12(1):219 –245, 2000.

[14] J. Peters J. Kobler. Policy search for motor prim-itives in robotics. Neural Information ProcessingSystems (NIPS), 2008.