A Qualitative Representation of Structural Spatial...

A Qualitative Representation of Structural Spatial Knowledge forRobot Navigation with Reinforcement Learning

Lutz Frommberger [email protected]

SFB/TR 8 Spatial Cognition, R3-[Q-Shape], Universitat Bremen, P.O. Box 330 440, 28334 Bremen, Germany

Abstract

In robot navigation tasks, the representationof knowledge of the surrounding world playsan important role, especially in reinforcementlearning approaches. This work presents aqualitative representation of space that em-powers an agent to learn a goal-directed nav-igation strategy based on structural knowl-edge of the world that leads to a generally sen-sible navigation behavior that can be trans-ferred to completely unknown environments.

1. Introduction

In goal-directed navigation tasks, an autonomous mov-ing agent has fulfilled its mission when having reacheda certain location in space. Reinforcement Learning(RL) is frequently applied to such tasks, because it al-lows an agent to autonomously adapt its behavior toa given environment. However, in large and in contin-uous state spaces RL methods require extremely longtraining times.

The navigating agent learns a strategy that will bringit to the goal from every position within the world,that is: it learns to select an action for every givenobservation of the environment. But what has theagent really learned about the world it operates in?Can it use the acquired knowledge at different loca-tions in this environment or even in an unknown one?This depends heavily on the chosen spatial representa-tion of the world that is passed to the learning system.In the worst case, all the collected knowledge can be-come useless. For example, when the agent operatesin a different environment, a completely new set of ac-tion selections may be necessary, and the agent has tolearn everything again from scratch, including collisionavoidance strategies. The agent lacks an understand-

Appearing in the ICML-06 Workshop on Structural Knowl-edge Transfer for Machine Learning, June 29, Pittsburgh,PA.

ing of the general structure of geometrical spaces.

Thrun and Schwartz (1995) claim that it is necessaryto discover the structure of the world and abstractfrom its details to be able to adapt RL to more complextasks. The aim of the approach I present in this pa-per is to enable the agent to profit from this structureby using an appropriate qualitative representation forit. Qualitative spatial representations provide an in-terface that is based on human spatial concepts. Theyare an expressive means of representing the relationsamong features in geometrical space. I claim that thisway of representing structural spatial information canenable the agent to develop a generally sensible be-havior in space that it can reuse at different locationswithin the same world and can also be transferred tolearning tasks in other, unknown environments. Theuse of this representation will also support the learningprocess by speeding it up and making it more robust.

Much effort has been spent to accomplish improve-ments regarding the training speed of reinforcementlearning in navigation tasks, and consideration of thestructure of the state space has been found to be animportant means to reach that goal. Topological neigh-borhood relations can be used to improve the learn-ing performance (Braga & Araujo, 2003). Thrun andSchwartz (1995) tried to find reusable structural infor-mation that is valid in multiple tasks. Glaubius et al.(2005) concentrate on the internal value-function rep-resentation to reuse experience across similar parts ofthe state space, using pre-defined equivalence classes.Lane and Wilson (2005) describe relational policiesfor spatial environments and demonstrate significantlearning improvements. To avoid problems with walls,they also suggest regarding the relative position ofwalls with respect to the agent, but did not realizethis approach yet.

The first goal of this work is to find a spatial represen-tation that leads to a small and discrete state spacewhich enables fast and robust learning of a naviga-tion strategy in a continuous, non-homogeneous world.

A Qualitative Representation of Structural Spatial Knowledge for Robot Navigation with RL

The second goal is to extract structural knowledgefrom the environment to enable the agent to reuselearned strategies in similar areas within the sameworld and also being able to transfer this knowledgeto other, unknown environments.

2. Navigation solely on the basis ofordering of landmarks

The task considered within this work is a simple goal-directed navigation task: an autonomous robot is re-quested to find a certain wall in a simulated simplifiedoffice environment completely unknown to the agent(see Figure 1). The robot is supposed to be capable todetermine landmarks around it to identify its location.It is assumed that every wall is uniquely distinguish-able, making the whole wall a landmark of its own. Torepresent this, each wall is considered to have a certaincolor.

1

-

2

3

4

1

2

3

4

Figure 1. The task: a robot in a simplified simulated officeenvironment with uniquely distinguishable walls. Detectedcolors are depicted in the upper left, with labels attachedmarking the corresponding walls.

The robot is capable of performing three different basicactions: moving forward and turning a few degreesboth to the left and to the right. Each basic actionis repeated as long as the perception vector remainsthe same. There is no built-in collision avoidance orany other navigational intelligence provided. A smallamount of noise is added to the motor actions.

The agent uses a compact qualitative abstraction ofreal world information: the colors perceived at 7 dis-crete angles around it, a vector c = (c1, . . . , cn). Everyphysical state, a tuple (x, θ) of the robot’s positionx and orientation θ, maps to exactly one color vec-tor c. This mapping is not injective: multiple physi-cal states share the same representation. The encod-ing of a circular order of perceived colors is sufficientto approximately represent the position of the agentwithin the world and to derive a sequence of actionsto reach the goal. However, it is not sufficient to pre-

vent the robot from collisions. As stated above, themapping from physical locations to the state represen-tation is not unique, and, given the same system input,the consequences of an action can differ dramatically.Performing the same action at the same system statewill sometimes result in a collision and sometimes not,which prevents from retrieving a proper rating for thisstate-action pair. The representation used so far doesnot encode any information about the agent’s positionregarding the obstacles. It does not support the agentin developing a generally sensible spatial behavior thatshould also emerge in the absence of landmarks.

3. (G)RLPR: a spatial representationof relative position of line segments

Navigation in space, as performed in the learning ex-amples, can be viewed as consisting of two differentaspects: (1) Goal-directed behavior towards a certaintarget location depends highly on the task that has tobe solved. If the task is to go to a certain location, theresulting actions at a specific place are generally differ-ent for different targets. Goal-directed behavior is task-specific. (2) Generally sensible behavior regarding thestructure of the environment is more or less the samein identical or similar environments. A generally sensi-ble behavior in indoor office environments for examplewould be not to crash into walls, turn around cornerssmoothly etc. It does not depend on a goal to reach,but on characteristics of the environment that invokesome kind of behavior. Generally sensible behavior istask-independent. The aim is to find a representationthat divides between the two aspects of navigation be-havior in order to be able to extract generally sensiblebehavior from a learned strategy.

The relations of walls towards each other induce sensi-ble paths inside the world which the agent should learnto follow. I propose a representation of the relative po-sitions of lines towards the agent’s moving direction.For further reference, it is called RLPR (Relative LinePosition Representation). RLPR is intentionally cho-sen to be extremely compact while being very expres-sive to be added to an existing feature vector.

To encode the relative positions of certain entities re-garding the agent’s position and orientation, we con-struct an enclosing box around the robot and then ex-tend the boundaries of this box to create eight disjointregions R1 to R8 (see Figure 2a). This representationwas proposed to model the movement of extended ob-jects in a qualitative manner (Mukerjee & Joe, 1990)and is closely related to the direction-relation matrix(Goyal & Egenhofer, 2000). We can define a traversalstatus t(B,Ri) of each line B regarding a region Ri as


(a) 1

2

345

6

7 8 1

2

345

6

7 8

9

101112

13

(b)

Figure 2. Neighboring regions around the robot in relationto its moving direction. The regions in the immediate sur-roundings (b) are proper subsets of R1, . . . , R8 (a).

follows: t(B,Ri) = 1 if a line B cuts region Ri and 0if not. The total number of lines in a region Ri is

t(Ri) =∑B∈B

t(B,Ri) (1)

with B being the set of all detected line segments.

For navigation purposes, it is particularly interestingto know which line segments span from one region toanother. A corridor with a left turn, for example, willhave connected line segments in all the right and frontsectors, but none in freely traversable space. To addi-tionally encode that a line segment B spans from onesector to another, we determine if a line B lies withincounter-clockwise adjacent regions Ri and Ri+1 (forR8, of course, we need to consider R1):

t′(B,Ri) = t(B,Ri) · t(B,Ri+1) (2)

t′(B,Ri) is also very robust to noisy line detection, asit does not matter if a line is detected as one or moresegments. The total number of spanning line segmentsin a region, t′(Ri), is derived analogously to (1).

To achieve a valuable representation of the environ-ment, special care has to be taken on the immediatesurroundings of the agent. The position of detectedline segments is interesting information to be used forgeneral orientation and mid-term planning. Addition-ally, if an object is detected in the immediate surround-ings, this information refers to obstacle avoidance andrequires a prompt reaction. It can also be utilized torealize certain behaviors, for example, a wall followingstrategy. For regarding the regions near the agent, therepresentation described above is used twice. On theone hand, there are the regions R1, . . . , R8 that arebounded by the perceptual capabilities of the robot.One the other hand, bounded subsets of those regionsrepresent the immediate surroundings (see Figure 2b).

To combine knowledge about goal-directed and gener-ally sensible spatial behavior, we now build a featurevector by concatenating the representation of detectedcolors and RLPR. A system state s is represented as

s = (c0 . . . cn, t′(R1) . . . t′(R8), t(R9) . . . t(R13)) (3)

Within the immediate surroundings, the informationof spanning line segments is not that important, sot(Ri) is used instead of t′(Ri) for i ≥ 9.

In the following I want to show that while learninga goal-directed navigation task with a clearly speci-fied target location the agent can also learn a sensiblebehavior regarding the structure of the surroundingworld independent of landmark information when us-ing RLPR. Therefore, the generalization abilities oftile coding (Sutton, 1996) are used in the value func-tion representation over the complete range of colorinformation. This means that an update of the policyaffects all system states with the same RLPR repre-sentation. This generalizing variant of RLPR is calledGeneralizing RLPR (GRLPR). As an effect, the agentcan reuse knowledge acquired within the same learningtask. The learned strategy is also applicable to new en-vironments that are completely unknown to the agent.

4. Experimental results

All experiments have been conducted using Watkins’Q(λ) algorithm. In a first experiment, the robot hasto solve the goal finding task in the world depictedin Figure 1. When using only (c1, . . . , c7) as the input(“pure” approach), no robust learning behavior can beachieved. In contrast, the additional use of RLPR orGRLPR leads to stable learning: after about 10,000episodes, almost all of the test runs reach the goal.

Figure 3 compares the “pure” approach with RLPRand GRLPR during the first 15,000 episodes. Due tothe non-generalizing behavior and the larger featurevector compared to the “pure” approach, RLPR showsa slower learning in the very beginning, but gets compa-rably successful as GRLPR over time. GRLPR learnsfaster than the other two approaches in the early train-ing phase. It benefits from its generalizing behaviorthat empowers it to reuse structural spatial knowledgegained in other parts of the environment.

An important measure of the quality of navigation isthe number of collisions that occur during training.This number is reduced significantly with the addi-tional line position representations (see Table 1). Inparticular, GRLPR performs noticeably better thanRLPR. This indicates that the generalization abilityleads to a sensible navigation behavior rather early.

To show that the learned knowledge of the structureof the world is not only beneficial within the samelearning task, we must examine how the agent behaveswhen using the learned strategy in the absence of land-marks or in an unknown world. After learning withGRLPR for 40,000 episodes, the landmark information


0

20

40

60

80

100

2000 4000 6000 8000 10000 12000 14000

succ

ess

(%)

runs

noneRLPR

GRLPR

Figure 3. Number of runs reaching the goal state: GRLPRlearns faster than the “pure” approach. RLPR is slowerin the beginning, but catches up quite fast. Without(G)RLPR, no stable learning can be achieved.

Table 1. Number of collisions after 15,000 episodes

Repr. Training Test

pure 21388 2385RLPR 11316 1200

GRLPR 7929 596

is turned off, so that the agent perceives the same (un-known) color vector regardless of where it is. As aresult, the agent is still able to navigate collision-freeand perform smooth curves, even without receivingany landmark information. Most of the time it uses afollow-the-wall strategy, a commonly-used strategy inrobotics. The learned strategy can also successfully betransferred to absolutely unknown environments. Fig-ure 4 shows the agent’s trajectories in a landmark-freeworld it has never seen before, using the gained knowl-edge from the prior experiment.

5. Conclusion and outlook

Solving a goal-directed robot navigation task can belearned with reinforcement learning using a qualita-tive spatial representation purely using the ordering oflandmarks and the relative position of line segments.The proposed representation results in a fast and sta-ble learning of the given task. Structural informa-tion within the environment is generalized and can bereused within the same learning task, which leads to agenerally sensible navigation strategy that facilitates afaster learning and reduces collisions significantly. Fur-thermore, the learned knowledge can be transferred toenvironments lacking landmark information and/or to-tally unknown environments: the agent learns a gener-ally sensible behavior in geometrical spaces.

In future work it has to be shown that the acquired

Figure 4. Trajectories of the agent in an unknown environ-ment without landmarks, using the strategy learned in adifferent world with GRLPR.

strategy, used as background knowledge, can speed upnew learning tasks in unknown environments. Alsoother methods of evaluating structural knowledge, e.g.,in a hierarchical manner, need to be investigated.

References

Braga, A. P. S., & Araujo, A. F. R. (2003). A topo-logical reinforcement learning agent for navigation.Neural Computing and Applications, 12, 220–236.

Glaubius, R., Namihira, M., & Smart, W. D. (2005).Speeding up reinforcement learning using manifoldrepresentations: Preliminary results. Proceedings ofthe IJCAI Workshop ”Reasoning with Uncertaintyin Robotics”.

Goyal, R. K., & Egenhofer, M. J. (2000). Consistentqueries over cardinal directions across different lev-els of detail. Proceedings of the 11th Intl. Workshopon Database and Expert System Applications.

Lane, T., & Wilson, A. (2005). Toward a topologicaltheory of relational reinforcement learning for navi-gation tasks. Proceedings of FLAIRS-2005.

Mukerjee, A., & Joe, G. (1990). A qualitative modelfor space. Proceedings of the Eighth National Con-ference on Artificial Intelligence (AAAI).

Sutton, R. S. (1996). Generalization in reinforcementlearning: Successful examples using sparse tile cod-ing. In D. S. Touretzky, M. C. Mozer and M. E.Hasselmo (Eds.), Advances in neural informationprocessing systems, vol. 8, 1038–1044.

Thrun, S., & Schwartz, A. (1995). Finding structure inreinforcement learning. In G. Tesauro, D. Touretzkyand T. Leen (Eds.), Advances in neural informationprocessing systems, vol. 7.

A Qualitative Representation of Structural Spatial...

Documents

Transcript of A Qualitative Representation of Structural Spatial...