Planning 3D Task Demonstrations of a Teleoperated Space … · 2008. 9. 17. · one point to...

Planning 3D Task Demonstrations of a Teleoperated Space Robot ArmFroduald Kabanza and Khaled Belghith and Philipe Bellefeuille and Benjamin Auder

Universite de SherbrookeSherbrooke, QC J1K 2R1, Canada

{kabanza, khaled.belghith, philipe.bellefeuille, benjamin.auder}@usherbrooke.ca

Leo HartmanCanadian Space Agency

Saint-Hubert, QC J3Y 8Y9, [email protected]

Abstract

We present an automated planning application for generat-ing 3D tasks demonstrations involving a teleoperated robotarm on the International Space Station (ISS). A typical taskdemonstration involves moving the robot arm from one con-figuration to another. Our objective is to automatically planthe position of virtual cameras to film the arm in a mannerthat conveys the best awareness of the robot trajectory to theuser. Given a new task, or given changes to a task previouslyplanned, our system automatically and efficiently generates3D demonstrations of the task without the intervention of acomputer graphics programmer. For a given task, the robottrajectory is generated using a path planner. Then we considerthe filming of the trajectory as a sequence of shots satisfyingsome temporally extended goal conveying constraints on thedesirable positioning of virtual cameras. Then a temporal-logic based planning system (TLPlan) is used to generate a3D movie satisfying the goal. One motivation for this ap-plication is to eventually use it to support ground operatorsin planning mission tasks for the ISS. Another motivation isto eventually use automatically generated demonstrations ina 3D training simulator to provide feedback to student astro-nauts learning to manipulate the robot arm. Although mo-tivated by the ISS application, the key ideas underlying oursystem are potentially useful for automatically filming otherkinds of complex animated scenes.

IntroductionThe Space Station Remote Manipulator (SSRMS) is a 17-meter long articulated robot arm mounted on the Interna-tional Space Station (ISS). It has a complex geometry, withseven rotational joints, each with a range of 270, and twolatching end effectors which can be moved to various fix-tures, giving it the capability to walk from one grapplingfixture to next on the exterior of the ISS. The SSRMS isa key component of the ISS and is used in the assembly,maintenance and repair of the station, and also for movingpayloads from visiting shuttles. Astronauts operate the SS-RMS through a workstation located inside one of the ISScompartments.

Copyright c© 2008, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

Operators manipulating the SSRMS on orbit receive sup-port from ground operations. Part of this support consistsof visualizing and validating manoeuvres before they are ac-tually carried out. Often, the ground operators have to up-link videos to the ISS explaining how the tasks should bedone. Currently these videos are programmed by computergraphics programmers based on instructions given by mis-sion planning experts. Converging to a video that correctlypresents the desired task views of the operation can take asignificant amount of time.

In order to help the ground support operations generatethese video demonstrations, we have been developing an au-tomatic task demonstration generator (ATDG), which cangenerate 3D animations that show how to perform a giventask with the SSRMS. The current ATDG prototype is in-tegrated with a proof-of-concept simulator called ROMANTutor (for RObot MANipulation Tutor) for the commandand control of the SSRMS that we have also developed (Ka-banza, Nkambou, and Belghith 2005). Figure 1 shows asnapshot of the simulator. As with the physical commandand control system of the ISS, the simulator has three mon-itors and fourteen different cameras on the exterior on theISS. The astronaut can see the exterior only through thesemonitors by assigning one camera to each.

The ATDG is given as input a task that must be performedby an operator using the SSRMS. Almost all tasks involve amove of the SSRMS from one configuration to another, inorder for example to move a payload, or inspect a region ofthe ISS exterior using a camera mounted on the end effec-tor. Given a task, the ATDG calls the path-planner to get atrajectory of the SSRMS and then simulates the SSRMS onthis trajectory, while filming it.

Filming the trajectory of the SSRMS fundamentallyamounts to selecting the virtual cameras used to show theSSRMS at different points of the trajectory and selectingthe configuration of these cameras. There is an infinity ofpossible virtual camera positions and possible virtual cam-eras configurations. If the objective is to give the operator asense of the task as he will be seeing it from the commandand control workstation, then virtual camera positions willbe selected from the 14 positions of the cameras on the ex-

164

Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling (ICAPS 2008)

Figure 1: Roman tutor interface

terior of the ISS, that is, a finite number of possibilities.If the objective is to convey some cognitive awareness of

the task, then a virtual camera can be selected in any positionthat will best help the operator gain cognitive awareness. Forinstance, the animation could include a panoramic view of alarge section of the ISS from a virtual camera at a point inspace away from the ISS. Such a view is physically impos-sible given that all cameras are attached to the ISS but it isuseful and in some cases better than visualizing the SSRMSon the small physical models that astronauts normally use.

BackgroundVirtual Camera and Virtual Camera PlanningAs mentioned in the previous section, a virtual camera is acamera free to move all around the station. Figure 2 showsthe parameters of a virtual camera, which are its position,orientation, and zoom for a total of seven degrees of free-dom. The speed and acceleration of the camera is also im-portant when generating animations, but for the current im-plementation camera speed is not implemented yet. We justfollow the speed of the robotic arm. Planning camera con-figurations consists in generating a series of camera configu-rations, each corresponding to one frame of the desired finalmovie (an animation being constituted of a sequence of im-ages). Then, given this sequence of configurations, we use agraphic library (Coin3D, which is a public implementationof OpenGL) to interpret and display it.

Figure 2: Camera configuration

Scenes, Shots and Idioms

To simplify the calculation of the frame sequences compos-ing a film generated by ADTG, we borrowed some conceptsfrom the world of cinematography, namely the hierarchicaldivision of a movie into scenes and shots (see Figure 3), andthe concept of idiom.

A scene of a film is a distinct narrative unit usually char-acterized by unity of location or unity of time. In our case,we consider a scene as a distinctive movement of the roboticarm (e.g., translation along the mobile base, or moving theend effector towards a given position). A scene is defined asa sequence of shots. A shot in turn is defined as a continuoussuccession of images taken by one camera (each generally

165

Figure 3: Film abstraction hierarchy

lasts a few seconds). Figure 3 illustrates this film hierarchy.1An idiom describes how a scene can be filmed, that is,

how the different shots composing a scene can be taken. Forinstance, an idiom for a conversation between two peoplewill specify the number of shots and how the camera foreach shot is placed and configured. In general we have dif-ferent ways of filming a scene. Hence we have differentidioms for a scene. But in a film, for each scene only onespecific idiom will be selected and applied. Similarly, foreach distinctive movement of the SSRMS, we have differentidioms for filming the movement.

Camera Planning ApproachesThe problem of automatically generating a movie essentiallyconsists of determining what shot to make at different seg-ments of the scene. This is often understood as a cam-era planning problem, that is, determining the sequences ofshots given a goal corresponding to what meaning should beconveyed by the movie (e.g., a looming collision, the armgoing too fast, or in other applications a person running, ahappy person, etc.).

Automated camera planning approaches can generally begrouped into two categories: constraint approaches and cin-ematographic approaches. Constraint approaches film ascene (i.e., its objects or background, static or in motion)based only on the geometry of the scene (Bares, Gregoire,and Lester 1998; Benhamou et al. 1994). Constraints onthe camera (position, view angle, zoom, covered objects orregions) are essentially geometric (e.g., filming a movingcharacter during 20 minutes, without any occlusion of thecamera). There are no explicit constraints associated withactions involving objects in the scene. For instance, we can-not express a constraint such as “if the person is movingthen his legs must remain in the camera view angle duringthe motion”.

The approach in (Nieuwenhuisen and Overmars 2004)could also be seen as a constraint or gemoetric approach,althought not explicitly introduced that way. Camera place-ments are seen as configurations in a probabilistic roadmap;a camera plan (for example, to navigate in a museum fromone point to another) becomes a trajectory in the roadmap,obtained using probabilistic roadmap algorithms normally

1A more general film hierarchy considers two additional gran-ularity levels, where a film consists of acts, an act comprises se-quences, and a sequence contains one or more scenes. Our hierar-chy stops at the scene level.

used for articulated bodies (e.g., robots) combined withsome postprocessing to smooth the view angles and thezoom.

Clearly constraint approaches are not appropriate for film-ing a robot arm, because we want to be able to film it differ-ently depending on the context of the task being filmed, thatis, depending on semantics we associate with actions.

In contrast, cinematographic camera planning approachesfilm a scene by taking into account constraints on actionsin addition to geometric constraints. The general principleunderlying these approaches is to specify rules for compos-ing/sequencing shots depending on the type of action beingfilmed. For example, we may want to specify a rule such asif you are filming a sequence of shots showing one persontalking to another, the faces of both people must remain inthe camera view angle. Or, if the end effector of the SSRMSis aligned with some visual cue on the ISS then we wouldlike to film it with the cue in view (e.g., near the Canadianflag) and thereby make the astronaut aware of the currentposition of the end effector in relation to the feature.

The Camera Planning System (CPS) of (Christianson etal. 1996) is an old example of a cinematographic cameraplanner. It automatically generates camera positions basedon idioms specified in a declarative camera control language(DCCL). The language provides four primitive concepts,respectively fragments, views, placements, and movementendpoints. By camera placement we mean a continuoussequence of camera configurations over a period of time(or equivalently, from one point of the scene to another).These quite intuitive primitives are then combined to spec-ify higher-level constructs such as shots and idioms.

An example of a fragment is a go-by shot which repre-sents a static camera overseeing a given region for a givenperiod. Using these DCCL primitives, the user can specifyidioms indicating how to film objects of interest. For ex-ample, we can specify how to film two people approachingeach other: the camera could show one person advancingat some time; then switch to a close position where we seeboth of them, and finally slightly moving away from the tar-get. The generic aspect of the system comes from the factthat the same scene can be filmed the same way regardlessof the involved characters. Also, one can change the film-ing of the scene by specifying a different idiom, withoutgoing into the details of the computer graphics, which arefilled in automatically through the primitive operators. Theapproach in (Bares et al. 1998) is analogous, with the dif-ference that users can specify visual preferences about howobjects should be filmed. Given such preferences, the sys-tem UCAM selects the best camera placements that satisfythem.

The approach in (Halper, Helbing, and Strothotte 2001)combines constraint and cinematographic approaches. Theconstraints used include height angle, distance to the target,orientation of the camera. The algorithm takes as input a ge-ometric description of the scene (e.g., the object in the sceneone would like to see) and plans the camera positions ac-cordingly, while taking into account cinematographic prin-ciples regarding camera placements.

166

Limitations of Previous ApproachesThese approaches are particularly interesting in domainswith predefined story scripts and in which subjects are char-acters and objects as normally found in movies or videogames. But they are limited for filming an articulated robotarm. First of all, these approaches rely on a finite number ofprimitive filming operators which convey cinematographicprinciples. These primitives are proper to the domain of ac-tors. To illustrate, “Extreme Close Up” is a primitive con-sisting of using a camera zoom on the face of a person beingfilmed. There are a small number of such primitives (Chris-tianson et al. 1996; Bares et al. 1998). These primitivesare not relevant for filming a robot arm (since the pointsof interests and types of actions are different), but they canbe adapted. In that regard, we must develop an ontologyabout types of meaningful elementary movements for robotarms relating to task description. One of the contributions ofour work is to initiate this quest for ontological definition ofcamera planning in the domain of articulated robot arms.

Another limitation is that in most of the previous ap-proaches, the trajectory/motion/actions of the object beingfilmed must be known in advance. In our case, tasks andunderlying trajectories in our domain are not scripted in ad-vance. As we mentioned, given a task, we generate a tra-jectory of the robot accomplishing the task by using a pathplanner. This trajectory has to be automatically decomposedinto meaningful segments (shots) according to some heuris-tics. This decomposition is another contribution of our work.

Given the right primitives and the right shots for the robotdomain, in principle the previous approaches become ap-plicable. However, here we opt rather for using LinearTemporal Logic (LTL) (Bacchus and Kabanza 2000) as thelanguage for specifying shot composition rules. This pro-vides two advantages. First, the language is more expres-sive, yet with a simpler semantics, than previous cameraplanning languages such as DCCL. For instance, we canexpress arbitrary temporal conditions about the order inwhich objects should be filmed, which objects should re-main in the background until some condition become true,and more complex constraints that the LTL language can ex-press. Secondly, with this language, we apply the TLPlanplanning algorithm (Bacchus and Kabanza 2000) to gener-ate camera positions. This planner is more powerful thanthe planners used for example in (Christianson et al. 1996;Bares et al. 1998), because with TLPlan, LTL shot compo-sition rules convey search pruning capability.

Overview of Our ApproachThe automatic task demonstration generator (ATDG) takesas input start and goal configurations of the SSRMS. Us-ing those two configurations, the ATDG will generate amovie demonstration of the required manipulations in orderto bring the SSRMS from its start configuration to its goalconfiguration. The top figure in Figure 4 illustrates the in-ternal architecture of the ATDG. The bottom one shows thedifferent steps the data go through in order to transform thetwo given configurations into a complete movie demonstra-tion.

Figure 4: ATDG architecture

First, the ATDG uses a path planning algorithm (i.e., theFlexible Anytime Dynamic Probabilistic Roadmap (FAD-PRM) planner (Belghith et al. 2006)), which takes the twogiven configurations and generates a collision free path be-tween them. This path is then given to the trajectory parserwhich separates it into categorized segments. This will turnthe continuous trajectory into a succession of scenes, whereeach scene can be filmed by a specific group of idioms. Theparser looks for uniformity in the movements of the SSRMS.This process is described in greater details in the next sec-tion.

Once the path is parsed, the camera planner uses TLPlanto find the best shots in order to best convey each scene,while making sure that the whole is pleasing and compre-hensive. To do this, TLPlan uses an idiom database to helpit find the best way to film each scene. In addition to theidiom database, TLPlan applies a set of LTL shot composi-tion rules to generate a movie that is visually appealing andcoherent. TLPlan further applies an occlusion detector tomake sure the SSRMS is visible all the time. Once TLPlanis done, we are left with a list of shots which is used by therendering system to create the animation. The renderer usesboth the shots given by TLPlan and the SSRMS trajectoryin order to position the cameras in relation with the SSRMS,generating the final task demonstration.

Detailed Algorithms and KnowledgeStructures

Segmenting a Robot Trajectory into ShotsIn order to divide the animation into short sequences as ahuman would, we must study the robotic arms trajectory.Indeed, it is the only factor that can be used to split themovie since everything else in the scene is stationary. Herewe present three ways to proceed with this partitioning task.

167

Dividing according to elementary movements The ideahere is to define some elementary motions of the robotic armthe segmentation algorithm is able to recognize in the tra-jectory. Presently we use the following elementary motions,based on an intuitive analysis of the different movements ofan arm one wants to recognize. In future work, the actualdecomposition will involve human-factor and instructor ex-perts.

• Vertical elevation: The arm moves up, due to the elbowjoint or the pitch shoulder joint. This movement occurswhen we need to elevate a load to avoid an obstacle, forexample.

• Lateral rotating motion: The movement of the yaw shoul-der joint dominates and causes the arm to move laterally,possibly to transfer to another local region of the stationwhen no obstacles lay between.

• Static rotation: This movement corresponds to a rota-tion of the segment shoulder-elbow, controlled by the rollshoulder joint.

• Wrist movement: As the name indicates, here only thewrists joints are moving significantly.

• Rail translation: The arm translates from one point to an-other along the rectilinear rail on the station. This move-ment is used when the arm needs to change to a differentwork area.

The algorithm used to detect these movements consistsof calculating the elementary variations along each of therobotic arm’s 7 degrees of freedom frame by frame and cutthe trajectory when the nature of the movement changes. Wecan then film each segment by selecting and then applyingan idiom well suited to the elementary motion.

Dividing according to objects and regions of interestWe also segment the trajectory according to its variationwith respect to given obstacles or visual cues. When thevariation reaches a fixed threshold, we have a transition toa new segment. The lower level implementation invokesthe graphics Proximity Query Package (PQP) (Larsen et al.2000) to calculate the distances from the arm to given ob-stacles or cues. We can also define this way segmentationsof the arm depending on whether the entire arm or selectedparts of the arm move from one zone of interest to another.

Specifying IdiomsSince idioms are sequences of shots, it helps to give a moredetailed description of shots. A shot is specified by five com-ponents: shot type, camera placement mode, camera zoom-ing mode, side of the line of interest and length.

• Shot Types: Five shot types are currently defined in theATDG System: Static, GoBy, Pan, Track and POV. AStatic shot is done from a static camera when the robot isin a constant position or moving slowly. A GoBy shot hasthe camera in a static position showing the robot in move-ment. For a Pan shot, the camera is in a static positionbut doing incremental rotations following the movementof the robot. A Track shot has the camera following the

Figure 5: Camera placements

robot and keeping a constant position relative to it. Fi-nally, the POV shot has the camera placed directly on theSSRMS, moving with the robot.

• Camera Placements: For each shot type, other than POV,the camera can be placed in five different ways accordingto some given line of interest: External, Parallel, Internal,Apex and External II (Figure 5). In the current implemen-tation, the line of interest is the trajectory along which thecenter of gravity of the robot is moving; this is sufficientfor filming many typical manoeuvres. POV is treated sep-arately. Since the camera is directly on the SSRMS, thepreviously described camera placements are inapplicable.This attribute is instead used to specify where on the robotarm the camera is placed (such as on the end effector, onsome joint, in the middle of a segment, etc.).

• Zoom modes: For each shot type and camera placement,the zoom of the camera can be in five different modes:Extreme Close Up, Close Up, Medium View, Full Viewand Long View.

• Side: Each shot type other than POV can be taken fromeither side of the line of interest or from above. In the caseof POV, this attribute is used to tell whether the camera isforwards or backwards of the SSRMS. Shots from aboveallow for smooth transitions between a shot from one sideto a shot from the other side of the line of interest or can beused when the robot is flanked on both sides by obstacles.

• Length: The length is a fraction of the scene occupied bythe shot. The total length for all shots in a scene must be1. For instance, if the first shot has a length of 0.25, thesecond, a length of 0.5 and the last a length of 0.25, whilethe scene lasts 2 seconds, then the first shot will end afterhalf a second, the second will then start and end at 1.5seconds, and so on.

More shot types, camera placements and zoom modes canbe added to specify a greater variety of shots. This is a topicfor future research.

Now we can explain how idioms are specified. An idiomis specified by describing the sequence of shots composingit. Figure 6 shows three examples. Each idiom consists ofan identifier, the scene to which it is applicable, the numberof shots and then the shots. Thus the first idiom is applicableto a translation of the SSRMS along its mobile base and it

168

contains three shots. The idiom states that a translation canbe filmed by first placing the camera parallel to the robotusing a medium view zoom, following SSRMS for a quarterof the whole movement, then changing for a full view zoomwhile still following the robot on a parallel course for halfthe scene and then stopping and using a rotation to follow therobot for the rest of the way, still at a full view zoom. Thisis just one of the many ways such a scene could be filmed.There are other idioms specifying the alternatives.

For instance, idiom2 illustrates another way of filming atranslation of the SSRMS along its mobile base. In this case,we film the end effector approaching an object by using anExternal camera placement so to a have a good view of theobject; the zoom is medium so to have a good view of theapproach. The following shot switches to a Close UP Staticview to give a good view of the manipulations made with theend effector.

The third idiom describes a sequence of shots for filmingthe end effector SSRMS fixing a new component on the ISS.The first shot is a tracking shot following the robot whiletranslating along its mobile base. The second shot is a panshot following the rotation of the robot to the anchor point.The last shot is a static shot focusing on the joint at the ex-tremity of the robot while fixing the new component on thedesired target.

(idiom1 translation 3

(go-by external full-view 0.25 left)

(go-by parallel long-view 0.5 left)

(go-by internal full-view 0.25 left))

(idiom2 effector 2

(static internal medium-view 0.33 right)

(static parallel close-up 0.67 right))

(idiom3 effector 3

(track parallel full 0.25 left)

(pan parallel medium 0.5 left)

(static parallel close 0.25 left))

.

Figure 6: Examples of idioms

Thus for each SSRMS movement type, we have severalidioms (from six to ten in the current implementation) andeach idiom is defined by taking into account the complexityof the movement, the geometry of the ISS, the visual cueson the ISS and subjective expectations of the viewer. Forexample, if the SSRMS is moving along its mobile base, itis important the camera not only show the entire arm but alsosome visual cues on the ISS so the operator can get a situ-ational awareness of the arm movement. Consequently, theidioms for this manipulation will involve shots with a Full orLong View zoom. In contrast, manipulations involving theend effector require a high precision, so an Extreme CloseUp zoom will be involved.

Specifying Shot Composition RulesThe description of idioms is based on considerations that arelocal to a scene. Shots within an idiom are in principle se-

;; line of interest

(always

(and

(forall (?t0 ?p0 ?z0 ?l0)

(last-shot ?t0 ?p0 ?z0 ?l0 right)

(next (not (exists (?t1 ?p1 ?z1 ?l1)

(last-shot ?t1 ?p1 ?z1 ?l1 left)))))

(forall (?t0 ?p0 ?z0 ?l0)

(last-shot ?t0 ?p0 ?z0 ?l0 left)

(next (not (exists (?t1 ?p1 ?z1 ?l1)

(last-shot ?t1 ?p1 ?z1 ?l1 right)))))))

Figure 7: A shot composition rule

quenced coherently with respect to the scene, but this doesnot guarantee that the sequencing of the scenes is coherenttoo. The role of shot composition rules is to ensure that se-lected idioms generate a continuous animation with smoothtransitions between scenes and with some global constraintsthat must be respected across the entire film. Such globalshot composition rules are expressed in LTL.

As mentioned before, LTL is the language used by TLPlanto specify temporally extended goals (Bacchus and Ka-banza 2000). LTL formulas are interpreted over state se-quences (Bacchus and Kabanza 2000). In our case, a stateconveys some properties about the current shot, hence LTLformulas are indirectly interpreted over sequences of shots.In the LTL language, one uses the temporal modalitiesnext, always, eventual, until, combined with thestandard first order connectives, to express temporal state-ments. For instance, (next f), where f is a formulameans that f is true in the next state; (always f) meansthat f holds on the entire state sequence; similarly for theother modalities, they have the intuitive semantics suggestedby their names. Given a planning domain, an initial state, anLTL temporally extended goal, TLPlan computes a plan, asa sequence of actions, such that the underlying sequence ofstates satisfy the goal.

Figure 7 illustrates an LTL shot composition rule forbid-ding the selection of two different sides for two successiveshots. This implements a cinematography rule that preventscrossing the line of interest because it could induce a mis-understanding of the manipulation performed. This idiomwill require TLPlan to insert an intermediate shot in order tosatisfy the requirement.

Planning the CamerasWhen searching for a film, shots are evaluated using the Oc-clusion Detector function according to their degree of oc-clusion. Specifically this function measures the degree ofvisibility of the robot within the shot. This is done by exam-ining each image in the shot and evaluating the number ofjoints present in the image and the zoom made on the robot.The quality measure on each image is heuristically definedby :

αshot = (NbrJV is/NbrJTot + SRob/STot)/2, with:

• NbrJV is : Number of joints visible in the image• NbrJV is : Total Number of joints in the robot

169

;; Idiom selection operator

(add-adl-op

:name ’(apply-idiom ?scene ?idiom)

:pre (adl-pre

:var-gens ’((?scene) (not-planned ?scene)

(?type) (sc-type ?scene ?type)

(?idiom) (idioms))

:form ’(eq? ?type (id-type ?idiom)))

:add (adl-add (adl-cond

:var-gens ’((?nbrShot)

(gen-nb-shot ?idiom))

:lit ’(nb-shot ?scene ?nbrShot))

(adl-cond :lit ’(planned ?scene ?idiom))

(adl-cond :lit ’(next-shot ?scene 0)))

:del (adl-del (adl-cond :lit ’(not-planned ?scene))))

Figure 8: Idiom selection operator

• SRob : Surface covered by the robot on the image• STot : Total Surface of the image

TLPlan calls the Occlusion Detector not only to computethe quality measure on each shot but also to compute thequality measure on every idiom. The quality measure of anidiom αidiom is the average of the quality measures on theshots composing it.

The planning domain is specified using two differentkinds of planning operators: Idiom-Selection and Shot-Specification. The first type of operators conveys idiom-level constraints. The second conveys shot-level constraints.More specifically, Idiom-Selection operators select idiomsfor shots, and Shot-Specification operators select attributesfor each shot composing an idiom. Since the attributes arealready specified by the idioms, the role of this operator isessentially to ensure that the shot follows the defined cin-ematographic rules and to allow the Occlusion Detector toverify the shot.

Figure 8 illustrates an idiom-selection operator. The oper-ator checks whether the scene already has an idiom associ-ated to it ( i.e., (not-planned ?scene)). If no idiom has beenplanned for the scene, the operator will update the currentstate by adding an idiom for scene, updating the number ofshots to be planned for this scene (as specified by the cho-sen idiom), and will update the next shot to be planned tobe the first shot of the idiom. Figure 9 illustrates a shot-specification operator.

During search, a (current) world state in this domain con-sists of:

1. The (current) scene from a given scene list.

2. The (current) idiom being tested as a candidate for thecurrent scene.

3. The (current) shot in the idiom currently being tested.

Intuitively, the search process underlying TLPlan exploresthe world state space as follows. On each iteration, TLPlantakes the current scene from the list of scenes and checkswhether an idiom has already been selected to be tested asbest candidate for it. If not, it calls the Idiom-Selection op-erator and selects an idiom from the list of idioms associated

;; Shot selection operator

(add-adl-op

:name ’(apply-shot ?scene ?shot-type ?shot-place

?shot-zoom ?shot-length ?shot-side)

:pre (adl-pre :var-gens ’((?scene ?idiom)

(planned ?scene ?idiom)

(?nextShot)

(next-shot ?scene ?nextShot)

(?shot-type)

(next-shot-type ?idiom

?nextShot)

(?shot-place)

(next-shot-place ?idiom

?nextShot)

(?shot-zoom)

(next-shot-zoom ?idiom

?nextShot)

(?shot-length)

(next-shot-length ?idiom

?nextShot)

(?shot-side)

(next-shot-side ?idiom

?nextShot)

(?nbrShot) (nb-shot ?scene

?nbrShot)))

:add (adl-add (adl-cond :form ’(= ?nextShot

(- ?nbrShot 1))

:lit ’(done-plan ?scene))

(adl-cond :form ’(not(= ?nextShot

(- ?nbrShot 1)))

:lit ’(next-shot ?scene

(+ ?nextShot 1)))

(adl-cond :lit ’(last-shot ?shot-type

?shot-place ?shot-zoom

?shot-length

?shot-side)))

:del (adl-del (adl-cond :var-gens ’((?nextShot)

(next-shot ?scene

?nextShot))

:lit ’(next-shot ?scene ?

nextShot))

(adl-cond :var-gens ’((?t ?p ?z ?l ?s)

(last-shot ?t ?p

?z ?l ?s))

:lit ’(last-shot ?t ?p ?z ?l

?s))))

Figure 9: Shot specification operator

170

to the corresponding category of scene. When a current id-iom is selected in the current state, TLPlan takes the list ofshots composing it and finds the next unplanned shot (if allthe shots have been planned, then the scene is completed andTLPlan can now move to the next scene). Then it calls theShot-Specification operators on the current shot which callsthe Occlusion Detector. If the shot is accepted, then it isadded to the list of planned shots.

DiscussionWe explicitly specify tthe sequence of shots composing anidiom. Given that an LTL formula describes a sequence ofstates (that is, the sequence of states satisfying it) we couldhave adopted LTL formulas not just for specifying shot com-position rules but also for describing idioms. For instance,idiom1 in Figure 6 could be specified as:

(always

(implies (filming translation)

(and (next (track parallel medium 0.25 left))

(next (next (track parallel full 0.5 left)))

(next (next (next

(pan parallel full 0.25 left))))))))

This approach would allow a richer idiom specificationlanguage since LTL can express more general sequencesthan listing a series of shots. However, with this approachthe size of the search space becomes larger and sophisticatedsearch control would have to be involved. Indeed, for eachscene, the search process would have to consider all possibleshots sequences satisfying the idiom formula. In contrast,with the current approach, search is only limited to the shotsequences in the idiom specifications. The use of LTL for-mulas to specify idioms remains a topic for future research.

We acquired knowledge on the SSRMS through discus-sions with experts (including instructors of the SSRMS) andsitting in actual training courses of the SSRMS. The currentidioms take into account only visibility requirements, butthere are other constraints that will have to be integrated tocomplete the tool, including the various modes of operatingthe arm, which involve, among many things, switching be-tween different frames of references during a manipulation.

ExperimentsWe used a publicly available Scheme version of TLPlanwithin the ATDG. The Robot Simulator is in C++. Thecommunication between TLPlan and other components inthe ATDG is done using sockets and through reading andwriting in text files. For example, TLPlan communicates bysockets with the Occlusion Detector to compute at each iter-ation the quality measure on shots and idioms. The cameraplan is passed to the renderer in text format.

We implemented two different variations of ATDG. Thefirst version (V1) delays the check of the quality of a shotuntil a sequence of shots has been found; the other versionmakes the check on the fly (V2) as has been understood sofar. In each case, the metrics for the quality of an anima-tion are the absence of occlusion. Checking occlusions takestime, hence the motivation to verify whether there is any

Figure 10: Snapshots and corresponding idioms

Figure 11: Performance data

gain by delaying it. The experiments also include a compar-ison with a simplified implementation of a constraint-basedapproach as in ConstraintCam (CC) (Bares, Gregoire, andLester 1998). As opposed to ConstraintCam, we did not im-plement a comprehensive constraint solving approach. Weonly implemented the types of constraints involved in theexperimental scenarios.

Figure 10 shows snapshots generated by ATDG illustrat-ing idiom1 in Figure 6. The scene in the ISS simulatoris specified by almost 85000 triangles. This is moderatelycomplex by computer graphics standards. The experimentswere performed on a Pentium IV, 2.8 GHZ, with 1G ofRAM.

The start and end configurations for the scenarios aregiven in Figure 12. Figure 11 shows the performance dataon five scenarios. The first three colomns in Figure 11 in-dicate, respectively: the scenario; the number of shots com-posing the film; and the duration of the film in seconds. Thenext three colomns express the quality of the movie for eachof the methods V1, V2 and CC, in terms of the proportionof shots without any occlusion of the camera and with allselected elements of the arm visible. This occlusion-basedquality measure could be refined by counting the number ofimages in occlusion or by taking into account the proportion

171

of the shot that is occluded. The last three colomns give theplanning time for each of the methods.

As the experiments show, the quality of the demonstra-tions generated by ATDG are very good in terms of the num-ber of shots that are not occluded, and this is one of the valu-able factor we are seeking in our application. Visually wealso noticed a very good smoothness of the film, consideringit is generated automatically. As it turns out, both versionsof ATDG generate movies of similar qualities, but they dif-fer in the planning time. Delaying the occlusion check turnsout to be paying off.

The results also shwo that the quality of the path filmedby ATDG was always better than CC. This is due to the factthat TLPlan works at the level of the idiom, a level higherthan that of a frame (the level to which ConstraintCam ap-plies) and this always ensures a higher level of quality. Wealso believe that with a C++ implementation of TLPlan, ourapproach would become more efficient. The key aspect ofTLPlan, however, is the use of LTL to specify shot composi-tion rules, which produces specifications more easily under-stood than frame-level constraints. It is important to remind,however, that this is a comparison with a simplified imple-mentation of the original ConstraintCam.

Conclusion and Future WorkWe have presented an application of automated planning tothe camera planning problem for the generation of 3D taskdemonstrations for an articulated robot arm. Our applicationcurrently concerns the SSRMS but the results are transfer-able to other telemanipulated robots and other domains.

So far we have obtained promising results using very sim-ple metrics for the quality of movies. Adding visual cuesand regional constraints is quite straightforward and will bedone in the near future. Before the tool becomes useable inpractice, additional metrics characterizing the different ma-noeuvres in terms of task and space awareness will have tobe brought in. The definition of these metrics will involvehuman factor experts and instructors.

Besides the intended future use of our system to supportground operators, future inquiry will also concern the in-tegration of the system into a training simulator to providefeedback to students by showing them how to accomplishthe task. This opens several interesting research opportuni-ties, including making the generated animation interactiverather than a continuous video as is currently the case.

Finally, as we are using the TPLan system, this frameworkalso opens up interesting avenues for developing efficientsearch control knowledge for this particular application do-main and for learning such knowledge. As mentioned above,it would be also be interesting to extend the use of LTL for-mulas to the specification of idioms.

ReferencesBacchus, F., and Kabanza, F. 2000. Using temporal logicsto express search control knowledge for planning. ArtificialIntelligence 116(1-2):123–191.Bares, W.; Zettlemoyer, L.; Rodriguez, D.; and Lester, J.1998. Task-sensitive cinematography interfaces for inter-

Figure 12: Scenarios

172

active 3d learning environments. In Intelligent User Inter-faces, 81–88.Bares, W.; Gregoire, J.; and Lester, J. 1998. Real-timeconstraint-based cinematography for complex interactive3d worlds. In American Association of Artificial Intelli-gence (AAAI/IAAI), 1101–1106.Belghith, K.; Kabanza, F.; Hartman, L.; and Nkambou, R.2006. Anytime dynamic path-planning with flexible prob-abilistic roadmaps. 2372–2377.Benhamou, F.; Goualard, F.; Languenou, E.; and Christie,M. 1994. Interval constraint solving for camera controland motion planning. In International Symposium on LogicProgramming (ILPS), 124–138.Christianson, D.; Anderson, S.; He, L.; Salesin, D.; Weld,D.; and M.F., C. 1996. Declarative camera control forautomatic cinematography. In National Conference on Ar-tificial Intelligence (AAAI), 148–155.Halper, N.; Helbing, R.; and Strothotte, T. 2001. Cam-era engine for computer games: Managing the trade-offbetween constraint satisfaction and frame coherence. InEurographics (EG), 174–183.Kabanza, F.; Nkambou, R.; and Belghith, K. 2005. Path-planning for autonomous training on robot manipulators inspace. 1729–1731.Larsen, E.; Gottshalk, S.; Lin, M.; and Manocha, D. 2000.Fast proximity queries with swept sphere volumes. In Proc.of Int. Conf. on Robotics and Automation, 3719–3726.Nieuwenhuisen, D., and Overmars, M. 2004. Motion plan-ning for camera movements in virtual environments. InIEEE International Conference on Robotics and Automa-tion (ICRA), 3870– 3876.

173

Planning 3D Task Demonstrations of a Teleoperated Space … · 2008. 9. 17. · one point to...

Documents

Transcript of Planning 3D Task Demonstrations of a Teleoperated Space … · 2008. 9. 17. · one point to...