Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting...

7
The human visual system's ability to extract three-dimensional structure from a two-dimensional source is the key to automatic interpretation of structure from motion. Visually Interpreting the Motion of Objects in Space Jon A. Webb and J. K. Aggarwal The University of Texas The ability to interpret motion in space is a fundamen- tal function of the human visual system. Investigation of this process is justified for psychological reasons alone, but there are also many practical benefits to be gained. For example, computer applications in robotics and the photogrammetry of human motion would advance with even partial solutions to the problems relating to visual in- terpretation of motion. In this article, we concentrate on the use of a single, fixed camera to interpret the move- ment of objects in space, and we present a new method for interpreting the motion of rigid and jointed objects. In the following discussion, assume that we have the x-y image positions of several points over a period of time.* The basic problem is that of assigning a reasonable three- dimensional structure to these points. We cannot determine the three-dimensional structure in absolute terms because anything seen could be pro- duced by either a small object close to the camera or a large object far from the camera. Therefore, "determine a three-dimensional structure" means "assign a three- dimensional structure with an unknown scaling factor that could account for the observed positions of the points." Definition of the structure in absolute terms re- quires more information, such as the absolute position of one of the points on the object. We can, however, seg- ment points into-objects, so that the structure of each ob- ject is determined according to a different scaling factor. Can such a structure be assigned at all, since it is not obvious that the x-y positions determine a structure? Because they generally do not, some assumptions must be made about how the points move in space. We will discuss these assumptions shortly, after considering the remark- able capacity of human vision to interpret two-dimen- sional motion three-dimensionally. Psychologists have known for some time that humans can interpret a two-dimensional moving figure as if it were in three dimensions. In the typical psychological experi- ment demonstrating this phenomenon, a subject views a two-dimensional projection (e.g., a movie, a CRT display, or a shadow) of a moving three-dimensional ob- ject. The subject thus obtains only two-dimensional in- formation, whether he uses one or both eyes to watch the figure. Subjects consistently report seeing three-dimen- sional structure under these conditions; in the right cir- cumstances, they can describe the structure correctly. The first major study of this phenomenon was by Wallach and O'Connell,3 who called it the "kinetic depth effect." We, like Ullman,4 refer to it as "structure from motion." Structure from motion is in common use in three- dimensional graphic display systems. Such systems allow the user to input the three-dimensional coordinates of an object and then examine its three-dimensional structure by moving and rotating its image on a CRT screen. Some of these systems use perspective projection (i.e., the more distant an object is from the viewpoint, the smaller it becomes) or shading to enhance the effect, but motion alone is enough to give the impression of three-dimen- sionality. *A discussion of techniques for obtaining these positions automatically is beyond the present scope. Research on this topic is surveyed in Martin and Aggarwal, l and Thompson2 discusses an algorithm for the recovery of this information. In practice, these positions can be obtained interactively from images of objects with identifiable targets. 0018-9162/81/O800-0040So0.75 © 1981 IEEE I 40 COM PUTER

Transcript of Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting...

Page 1: Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting the... · three-dimensional information about objects with only onecamera. Wemust,

The human visual system's ability to extract three-dimensional structurefrom a two-dimensional source is the key to automatic interpretation

ofstructurefrom motion.

Visually Interpreting the Motion of

Objects in SpaceJon A. Webb and J. K. Aggarwal

The University of Texas

The ability to interpret motion in space is a fundamen-tal function of the human visual system. Investigation ofthis process is justified for psychological reasons alone,but there are also many practical benefits to be gained.For example, computer applications in robotics and thephotogrammetry of human motion would advance witheven partial solutions to the problems relating to visual in-terpretation of motion. In this article, we concentrate onthe use of a single, fixed camera to interpret the move-ment of objects in space, and we present a new method forinterpreting the motion of rigid and jointed objects.

In the following discussion, assume that we have the x-yimage positions of several points over a period of time.*The basic problem is that of assigning a reasonable three-dimensional structure to these points.We cannot determine the three-dimensional structure

in absolute terms because anything seen could be pro-duced by either a small object close to the camera or alarge object far from the camera. Therefore, "determinea three-dimensional structure" means "assign a three-dimensional structure with an unknown scaling factorthat could account for the observed positions of thepoints." Definition of the structure in absolute terms re-quires more information, such as the absolute position ofone of the points on the object. We can, however, seg-ment points into-objects, so that the structure of each ob-ject is determined according to a different scaling factor.Can such a structure be assigned at all, since it is not

obvious that the x-y positions determine a structure?Because they generally do not, some assumptions must bemade about how the points move in space. We will discussthese assumptions shortly, after considering the remark-

able capacity of human vision to interpret two-dimen-sional motion three-dimensionally.

Psychologists have known for some time that humanscan interpret a two-dimensional moving figure as if it werein three dimensions. In the typical psychological experi-ment demonstrating this phenomenon, a subject views atwo-dimensional projection (e.g., a movie, a CRTdisplay, or a shadow) of a moving three-dimensional ob-ject. The subject thus obtains only two-dimensional in-formation, whether he uses one or both eyes to watch thefigure. Subjects consistently report seeing three-dimen-sional structure under these conditions; in the right cir-cumstances, they can describe the structure correctly. Thefirst major study of this phenomenon was by Wallach andO'Connell,3 who called it the "kinetic depth effect." We,like Ullman,4 refer to it as "structure from motion."

Structure from motion is in common use in three-dimensional graphic display systems. Such systems allowthe user to input the three-dimensional coordinates of anobject and then examine its three-dimensional structureby moving and rotating its image on a CRT screen. Someof these systems use perspective projection (i.e., the moredistant an object is from the viewpoint, the smaller itbecomes) or shading to enhance the effect, but motionalone is enough to give the impression of three-dimen-sionality.

*A discussion of techniques for obtaining these positions automatically isbeyond the present scope. Research on this topic is surveyed in Martin andAggarwal, l and Thompson2 discusses an algorithm for the recovery of thisinformation. In practice, these positions can be obtained interactivelyfrom images of objects with identifiable targets.

0018-9162/81/O800-0040So0.75 © 1981 IEEE

I

40 COMPUTER

Page 2: Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting the... · three-dimensional information about objects with only onecamera. Wemust,

The rigidity assumption

At present, it takes several cameras to get three-dimensional information about objects. But if the humanvisual system can use motion alone to find three-di-mensional structure, it should be possible to recover

three-dimensional information about objects with onlyone camera.

We must, however, make some assumptions about theobject being viewed because a sequence of views of a

single point moving in space reveals nothing about itsposition in depth. That is, the position of a point in spaceis determined by three numbers; a two-dimensional pro-

jection provides only two of them. We need an assump-

tion that does not require previous knowledge of the ob-jects, since the subjects participating in the psychologicalexperiments had no previous knowledge of the objectspresented to them. We can't use shading assumptions,since the subjects could see three-dimensional structureeven when viewing isolated points of equal brightness.The only permissible assumptions are about the motion ofthe points.

Mathematically, the motion of any system of pointscan be written as a transformation applied to the points.If the motion is rigid, the transformation can be decom-posed into two parts: a translation and a rotation, whichare applied in sequence. Ullman4 was the first to assumerigidity to interpret the three-dimensional structure of a

system of points.The equations of projection of a system of points relate

the unknown three-dimensional coordinates of a point toits known two-dimensional coordinates. If rigidity isassumed, new equations can be written to relate thedistances between the points to their known two-dimen-sional projections. Since these distances must be fixed, a

new set of equations can be written for each frame. Withenough frames and enough points, the equation set will bedetermined or over-determined so that the structure can

be recovered.Ullman used the rigidity assumption to recover the

structure (to within a scaling factor) of rigid systems ofpoints. To simplify the analysis, he assumed the points be-ing viewed to be far away relative to the distance betweenthem and to their motion in depth. This assumptionallowed the use of parallel or orthographic projection, inwhich distance has no effect on object size. Ullman was

able to find the structure with at least three views of fourpoints; his solution was in closed form.Roach and Aggarwal5 applied the rigidity assumption

to images of moving blocks. Because they did not assume

parallel projection, their solution was more general.However, this solution was not in closed form; numericaltechniques were used to solve the system of equations.(Ullman also considered the case in which parallel projec-tion could not be assumed.)

Shortcomings of the rigidity assumption. There are twomain problems with this approach. The first is that ac-

curacy in determining three-dimensional structure de-pends largely on accuracy in determining image planeposition. Because it is difficult to estimate the position ofa feature point, such as a corner, on the image plane, the

equations must be heavily over-determined to correctlydefine the structure. Also, in both of these studies thefeature points could not all lie in a plane, since coplanarpoints lead to a degenerate system of equations. This in-dicates that finding appropriate points can be difficult insome cases. For example, in images of atheletics thepoints of interest are on the limbs of the subject. Thesepoints are nearly coplanar; in fact, they are almost col-linear. The rigidity assumption alone is probably not suf-ficient to interpret these images.The second problem is more fundamental: the rigidity

assumption alone does not determine the distance be-tween two rigidly connected points. Moreover, the rigidi-ty assumption is of no use in determining whether or nottwo points are rigidly connected. This is because the twopoints can lie anywhere on the two rays that project ontotheir position in the image plane, as shown in Figure 1.

Only when there are three or more points can the structurebe determined.

Earlier work in structure and motion

Considering the discussion above, it is surprising tofind that people can interpret the motion of just twopoints in space. When Johansson and Jansson6 showedtheir subjects the two-dimensional image of a movingrod, the subjects saw a rod moving in space. Since theyrecovered the positions of just two points, the subjectsmust have made an assumption other than the rigidityassumption.

Other experiments by Johansson7 are still more surpris-ing. In these experiments, subjects were shown a movie ofa person walking around a dark room with lights attachedto his major joints. Even though only the lights could beseen, there was a very strong impression of three-dimen-sional motion in these movies. Again, there were only twopoints on each rigid part of the jointed object being viewed.

In addition to the work presented here, four separateattempts have been made to interpret images like those inJohansson's study. The first, by Rashid,8 involvedmeasuring the image plane distance between each pair ofpoints in the image and then testing for consistency indistance over a large number (25 or 30) of frames. There issome connection between this approach and ours; the

Figure 1. S can lie anywhere between the two rays.

August 1981

I.-

,0 R2,...--,f~- f?

iSIS o00

41

Page 3: Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting the... · three-dimensional information about objects with only onecamera. Wemust,

most significant difference is that we interpret the imagethree-dimensionally, while Rashid concentrates on ananalysis based on the image plane.The second approach, by O'Rourke and Badler,9

employed higher-level knowledge in image interpretation.A detailed model of a human figure was used to interpret

Figure 2. The second point must lie on a sphere about thefirst.

Figure 3. The second point must remain at a fixeddistance from the axis.

the motion of a moving dot pattern similar to thosestudied by Johansson. An interesting feature of thisresearch is that the model could be used to interpret struc-ture even when parts of the figure in the image were notvisible. For example, the program could predict the posi-tion of the figure's hand even when it was out of sightbehind the figure's back. We do not use high-level knowl-edge in interpreting Johansson's figures, even thoughhumans obviously make use of it.Two other researchers have examined the problems in

interpreting images like these. Clocksin10 developed aheuristic method for recovering the connection structureof objects. Hoffman and Flinchbaugh 1I used the assump-tion of planarity ofmotion to recover both the connectionstructure and the three-dimensional structure of the ob-jects. Planarity of motion is a special case of the assump-tion developed here.

The fixed axis assumption

We interpret Johansson's figures as collections of rigidparts, with each part consisting of two rigidly connectedpoints (the joints). 12 To interpret the motion of just twopoints, however, an additional assumption must bemade. For the reasons given above, this assumption mustbe about the motion of the points. Under the rigidityassumption, any motion can be written as a transforma-tion in the formA t, whereA is a rotation and t a transla-tion. We further assume that the axis specified by A isfixed over short periods of time. We call this thefixed axisassumption.

This assumption is satisfied by any rigid part whosemovements consist of translations and rotations around afixed axis. Because of design limitations and gravity, asurprisingly wide variety of natural and man-made ob-jects satisfy it. Rigid objects that travel on the ground,like cars, must rotate about a fixed axis pointing out fromthe ground-as do frisbees, tops, and maple seeds. A rol-ling object rotates about an axis perpendicular to its direc-tion of travel and parallel to the surface.Most jointed-object motion also satisfies this assump-

tion. For example, a ballerina executing a pirouette holdseverything still, relative to her torso, except for her headand feet. Skaters in a spin move their arms, but onlytoward and away from the axis of rotation (doing so quiteslowly, relative to their period of rotation). Divers andgymnasts move in quite complicated ways, but generallyrotate about one axis at a time. In normal walking,whether two- or four-legged, the limbs move in planesparallel to the line of travel. Thus each limb rotates aboutan axis perpendicular to the line of travel.

Like Ullman, we assume that the points on the objectsare far from the camera, relative to the distance betweenthem and to their motion in depth. That is, we assume thatthe apparent distance between two points changes only asa result of rotation of one about the other, not from theircombined motion in depth. Interpretation of motion indepth is tnuch harder because many apparent changes instructure can be due to motion in depth, rotation about apoint fixed in depth, or both.

COMPUTER

04

N

_ of

Q 2

R

42

Page 4: Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting the... · three-dimensional information about objects with only onecamera. Wemust,

The human visual system apparently makes further as-sumptions to analyze motion in depth. These assump-tions are not completely understood, and the interpreta-tion, of motion in depth by a computer might best be donedifferently in different contexts.

A method for determining structurefrom motion

The fixed axis assumption and the assumption ofparallel projection make possible a simple method fordetermining the structure and motion of any group ofrigidly connected points. Consider the motion of twosuch points. Since they are always a fixed distance apart,the second must remain on a sphere about the first, asshown in Figure 2. Under the fixed axis assumption, thesecond point must be a fixed distance from an axis thatpasses through the first, so that it lies on a cylinder passingthrough the center of the sphere, as shown in Figure 3.The intersection of the cylinder and the sphere is a circle ina plane normal to the axis.Under the assumption of parallel projection, this circle

stays at a fixed distance from the camera, so that the posi-tion of one point relative to the other projects onto anellipse. Without this second assumption, the circle couldproject onto a spiral as the rigid part comes closer to thecamera and increases in size. In fact, any curve could be pro-duced by rotating the second point about the first andthen moving the rigid part toward or away from thecamera.The equation describing the ellipse can now be discov-

ered by numerically fitting an ellipse to the observed posi-tions of the second point relative to the first. These posi-tions can be shown to be constrained to an ellipse in whichthe minor axis of the ellipse and a line from the origin tothe center of the ellipse are collinear. This makes theellipse recoverable from four views of the two points,assuming the complete absence of noise in estimating thepositions of the points. In practice, however, more framesare necessary because errors always occur in estimatingthe positions of feature points.Once the equation is found, the distance between the

two points can be determined. This is true because theminor and major axes of the ellipse determine the orienta-tion.of the circle that projects onto the ellipse. Therefore,the distance from the center of the ellipse to the origin andthe length of the major axis of the ellipse determine thedistance between the points. This makes it possible to findthe three-dimensional structure of any rigid object thatsatisfies the motion assumption, even those with as few astwo visible points.

This analysis can be extended to the case of jointed ob-jects with two points on each rigid part, as in Johansson'sfigures. First, we find the structure of each rigid part inthe jointed object. Next, the rigid parts are connectedthrough their joints by solving equations that relate theposition of the joint to the position of the rigid parts. Thisdetermines the three-dimensional structure of each

The same method applies when more than two visiblepoints are on the same rigid object. We use numericaltechniques to solve a system of equations that incor-porates the fixed axis assumption and the other restric-tions. This approach is more accurate in three-dimen-sional interpretation than are methods based on rigidityalone because more constraints are placed on the systemof equations.

In addition to determining structure, we can segmentthe points into objects, either rigid or jointed. Of course,we can segment only on the basis of what is seen-if twopoints appear to be rigidly connected, the program will in-terpret them as being connected.The structure proposed for the points is not unique;

under parallel projection, any visible structure can bereflected in depth and produce the same image. Furtherinformation is necessary to determine the correct three-dimensional structure from the two possible structuresfound by the above method. Some information, such asknowledge about when rigid parts hide each other fromview, could come from the image; other informationmight involve world knowledge, such as knowing theways legs can bend.The structure from motion method developed in our

research was implemented and run on data related to thetypes of figures studied by Johansson. Since completethree-dimensional information was available for thisdata, the accuracy of the results could be checked.

PURCHASE PLAN *12-24 MONTH FULL OWNERSHIP PLAN .36 MONYH LEASE PLAN| ~~~~~~~~ ~~~~~~~PURCHASEPER MONTH| O~~~~~~~~~~ESCRIPTION PRICE 12 MOS 24 MOS 36 MOS

LA36 DECwriterD . S1.095 S105 S 58S 40LA34 DECwriterIV . .995 95 531 36LA34 DECwriter IV Forms Ctrl. 1,095 105 58 40LA120 DECwriter yIIKSRm.n 2,295 249 122 83LA120 DECwriter III RO.... 2,095 200 112 75YT100 CRT DECscope .....1,595 153 85 58VT132 CRT DECscope .....1,995 190 106 72Tl745 Portable Terminal ....1,595 153 85 58Tl765 Bubble Memory Terminal 2,595 249 138 93TI Insight 10 Terminal .945 90 53 34T1785 Portable KSR, 120 CPS 2,395 230 128 86T1787 Portable KSR. 120 CPS 2,845 273 152 102Tl810 RO Printer. 1.895 182 102 69T1820 KSR Printer .2,195 211 117 80DT80 1 CRT Terminal. 1,695 162 90 61DT80O3 CRTTerminal. ,295 125 70 48DT80 51 APL 15 CRT. 2,295 220 122 83ADM3A CRT Terminal 875 84 47 32AOM31CRT Terminal. 1,450 139 78 53ADM42 CRT Terminal. 2.195 211 117 791420 CRT Terminal .945 91 51 341500 CRT Terminal. 1,095 105 58 401552 CRT Terminal.1295 125 70 48920 CRT Terminal .895 86 48 32950 CRT Terminal.......1,075 103 57 39Letter Quality, 55 15 RO. 2,895 278 154 104Letter Quality, 55 25 KSR. 3,295 316 175 119Letter Ouality KSR, 55 CPS ... 3,395 326 181 123Letter Quality RO, 55 CPS 2,895 278 154 104

l!730 Desk Top Printer 715 69 39 26~~~~~~737 W, P Desk Top Printer 895 86 48 32

:ULOWNERSHIP AFTER 12 OR 24 MONTHS - 10% PURCHASE OPTION AFTER 36 MONTHS

jointed object.

August 1981Reader Service Number 7

Page 5: Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting the... · three-dimensional information about objects with only onecamera. Wemust,

Figure 4. The walking man.

BAT

RIGHT METATARSALJOINT

LEFT METATARSALJOINT

Figure 5. Front view of woman with baseball bat.

44

Figures 4 and 5 show two sets of data analyzed by theprogram. Figure 4 shows a walking man, and Figure 5 awoman swinging a baseball bat. Six points are shown onthe walking man: the shoulder, the elbow, the wrist, thehip, the knee, and the ankle. These points are shown overa period of 0.26 second. The walking man data was col-lected with a stereo infrared camera system designed byEric Antonson of the mechanical engineering departmentat MIT.The baseball data shows 12 points: the bat, the wrists,

the left and right illiac crests, the hips, the knees, theankles, and the mnetatarsal joints in the feet. The datashown, extended over 0.32 second, was collected withstereo visible light cameras operated by Steve Messier ofthe department of health and physical education at theUniversity of Texas at Austin.The parallel projections from the viewpoint shown

were given to a program that used the fixed axis assump-tion to interpret the three-dimensional structure of thefigure. The program determined which points moved as ifconnected and then estimated the three-dimensionaldistance between rigidly connected points. The resultingconnection structures, shown in Figures 6 and 7, illustratethe constrained nature ofhuman movement when viewedover short periods of time. The connection structure forthe walking man is correct, except that the elbow is con-nected to the hip. The structure for the baseball player in-cludes many wrong connections. The program tends tomake spurious connections in determinig the structure ofheavily constrained motion because the constraints makethe actual movements appear more rigid than they are.Further difficulty comes from the fact that the data is

Figure 6. Connection structure for walking man. Dashedline shows incorrect connection proposed by program.

COMPUTER

SHOULDER

!HIP

KNEE

AN KLE

,SHOULDER

ELBOW

WRIST

'HIP

'KNEE

Page 6: Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting the... · three-dimensional information about objects with only onecamera. Wemust,

noisy, making the small deviations from rigid motionuseless. A program that looks at the sequences over longerperiods of time and intersects the proposed connectionstructures determined over short periods of time woulddetermine structure more reliably.

Tables 1 and 2 compare the three-dimensional rigidpart lengths determined by the program with the three-dimensional rigid part lengths determined by the stereocamera systems. The large number of errors in thebaseball data (as opposed to the error-free walking mandata) arise for several reasons:

* the baseball data is noisier than the walking mandata,

* some of the motion in the data, such as the bat mo-tion, does not satisfy the fixed axis assumption,

* points that are very close together, such as those inthe hip-illiac crest structure, are heavily affected bysmall errors in position, and

* points that do not move, such as those on the leftfoot, cannot be analyzed correctly.

Despite the errors, this method appears reliable enough toaid in the interpretation of some athletic movements,especially when only monocular images are available.

* BAT

D WRIST

D LEFTAN KLE

LEFTMETATARSAL

JOINT

RIGHTMETATARSAL

JOINT

Figure 7. Connection structure for woman with baseballbat; dashed lines show incorrect connections proposedby program.

Psychological implications

We have developed a system for the interpretation ofmany moving figures that humans can also interpret. Whatmight this system teach us about psychology? Suppose thehuman visual system uses the fixed axis assumption to in-terpret motion. This could help to explain how people in-terpret the motion of many objects. We have seen that thismethod might be powerful enough to interpret figures likethose studied by Johansson. In fact, Johansson himself13has studied the role of simple rotations in the visual inter-pretation of motion.What about motion that does not satisfy the fixed axis

assumption? Other processes that help people interprettilis kind of motion might exist, but we do not expect mo-tion that does not satisfy this assumption to be difficult to

Table 1.Rigid part lengths, walking man data.

ESTIMATED ACTUALFROM TO LENGTH LENGTH ERROR

(METERS) (METERS)SHOULDER ELBOW 0.344 0.335 2.53%

ELBOW WRIST 0.283 0.274 3.35%

SHOULDER HIP 0.584 0.579 0.808%HIP KNEE 0.438 0.437 0.175%

KNEE ANKLE 0.435 0.437 1.90%

Table 2.Rigid part lengths, woman with baseball bat.

ESTIMATED ACTUALFROM TO LENGTH LENGTH ERROR

INCHES INCHESBAT WRIST 23.70 27.30 13.30%

RIGHT ILLIAC LEFT ILLIAC 926.00 8.55CREST CRESTRIGHT ILLIAC LEFT HIP 10.30CRESTRIGHT ILLIAC RIGHT HIP 2.64 2.23 18.20%CRESTLEFT ILLIAC LEFT HIP 11.70 2.34 399.00%CRESTLEFT ILLIAC RIGHT HIP 3190.00 9.03CRESTLEFT HIP RIGHT HIP 10.50

LEFT HIP LEFT KNEE 16.60 16.90 2.07%LEFT KNEE LEFT ANKLE 14.60 15.50 5.69%

LEFT ANKLE LEFT 22.80 3.97 473.00%METATARSALJOINT

RIGHT HIP RIGHT KNEE 17.50 17.10 2.81%RIGHT KNEE RIGHT ANKLE 14.80 15.30 3.79%RIGHT ANKLE RIGHT 3.72 4.21 11.60%

METATARSALJOINT

'**INDICATES VERY LARGE ERRORS

August 1981 45

Page 7: Visually Interpreting Objects Spacecvrc.ece.utexas.edu/Publications/J.A. Webb Visually Interpreting the... · three-dimensional information about objects with only onecamera. Wemust,

interpret-if the human visual system uses rigidity aloneto interpret motion. The evidence on this last point is notclear. We are aware of only one study examining motionnot in the fixed axis category. In that study, Green'4showed subjects collections of rigidly moving points. Insome circumstances, the points looked less rigid whenthey moved with a tumbling motion. This effect waspresent even when several points were visible. Furtherstudies should be done to compare motion that satisfiesthe fixed axis assumption with motion that does not.For some degenerate kinds of motion, the fixed axis as-

sumption is not powerful enough to guide the determina-tion of structure from motion. This occurs when the axisof rotation either points directly at the viewer (so that theellipse degenerates to a circle) or is perpendicular to theline of sight (so that the ellipse degenerates to a line seg-ment). In the second case, motion can still be interpretedthree-dimensionally if it is assumed that the maximal ap-parent distance between the points occurs when the linefrom the first point to the second is perpendicular to theline of sight, as suggested by Webb.15 This assumptionhas been observed by Johansson and Jansson6 in showingtheir subjects the image of a rotating rod.

Two important problems are not addressed here: theinterpretation of motion in depth and the low-level visionproblem of the extraction and correspondence of thepositions of feature points. They are rich topics for futureresearch.*

Acknowledgments

We here acknowledge the helpful comments of LarryDavis and Worthy Martin, both of the Department ofComputer Science at the University of Texas. We wouldlike to thank Steve Messier, of the UT Department ofHealth and Physical Education, who provided the base-ball data, and Don Hoffman, of the MIT Al Lab, whoprovided the walking man data. We express our apprecia-tion for the comments of an anonymous reviewer andthose of W. E. Snyder, the guest editor of this issue; theyimproved the presentation of this article.

This work was supported by the Air Force Office ofScientific Research under grant number AFOSR 77-3190.

References

1. W. N. Martin and J. K. Aggarwal, "Survey: DynamicScene Analysis, " Computer Graphics and Image Process-ing, Vol. 7, No. 3, 1978, pp. 356-374.

2. W. B. Thompson, "Lower-Level Estimation and Interpre-tation of Visual Motion," Computer, Vol. 14, No. 8, Aug.1981, pp. 20-28.

3. H. Wallach and D. N. O'Connell, "The Kinetic Depth Ef-fect," J. Exp. Psych., Vol. 45, No. 4, 1953, pp. 205-217.

4. S. Ullman, The Interpretation of Visual Motion, MITPress, Cambridge, Mass., 1978.

5. J. W. Roach and J. K. Aggarwal, "Determining the Move-ment of Objects From a Sequence of Images," IEEETrans. Pattern Analysis and Machine Intelligence, Vol. 2,No. 6, Nov. 1980, pp. 554-562.

6. G. Johansson and G. Jansson, "Perceived Rotary MotionFrom Changes in a Straight Line," Perception and Psy-chophysics, Vol. 4, No. 3, 1968, pp. 165-170.

7. G. Johansson, "Visual Motion Perception," Sci. Amer-ican, Vol. 232, No. 6, June 1975, pp. 76-88.

8. R. F. Rashid, "Towards a System for the Interpretation ofMoving Light Displays," IEEE Trans. Pattern Analysisand Machine Intelligence, Vol. 2, No. 6, Nov. 1980, pp.574-581.

9. J. O'Rourke and N. I. Badler, "Model-Based ImageAnalysis of Human Motion Using Constraint Propaga-tion," IEEE Trans. Pattern Analysis and Machine In-telligence, Vol. 2, No. 6, Nov. 1980, pp. 522-536.

10. W. F. Clocksin, "Inference of Structural DescriptionsFrom Visual Examples of Motion: Preliminary Results,"DAI Working Paper 21, Dept. of Artificial Intelligence,University of Edinburgh, Edinburgh, UK, May 1977.

11. D. D. Hoffman and B. E. Flinchbaugh, "The Interpreta-tion of Biological Motion," Al Memo 608, MIT, Cam-bridge, Mass., Dec. 1980.

12. J. A. Webb and J. K. Aggarwal, "Visual Interpretation ofthe Motion of Objects in Space," Proc. Pattern Recogni-tion and ImageProcessing Conf., Dallas, Tex., Aug. 1981,pp. 516-521.

13. G. Johansson, "Visual Perception of Rotary Motions asTransformations of Conic Sections," Psychologia, Vol.17, 1974, pp. 226-237.

14. B. F. Green, Jr., "Figure Coherence in the Kinetic DepthEffect," J. Exp. Psych., Vol. 63, No. 3, 1961, pp. 272-282.

15. J. A. Webb, "Static Analysis of Moving Jointed Objects,"Proc. Ist Nat'l Conf. Artificial Intelligence, Stanford, Cal.,Aug. 1980, pp. 35-37.

Jon A. Webb is a PhD student in computerscience at the University of Texas atAustin. His major area of study is comput-er vision, especially the three-dimensionalinterpretation of motion. His interests in-clude telecommunication and artificial in-telligence. Webb holds a BA in mathemat-ics from the University of South Floridaand an MS in computer science from OhioState University. He is a member of the

ACM, the IEEE Computer Society, and the Cognitive ScienceSociety.

J. K. Aggarwal is a professor of electricalengineering and computer sciences at theUniversity of Texas at Austin, where hehas taught since 1964. His current researchinterests are image processing and digital

. filters. Also, he has been a visitingassociate professor at the University of

X California at Berkeley and a visiting assis-tant professor at Brown University.He has published many books and tech-

nical papers, including Notes on Nonlinear Systems, NonlinearSystems: Stability Analysis, Computer Methods in ImageAnalysis, and Digital Signal Processing.

Aggarwal is a member of the IEEE, the Pattern RecognitionSociety, and Eta Kappa Nu. He is currently an associate editor ofPattern Recognition and the IEEE Transactions on PatternAnalysis and Machine Intelligence.A native of India, Aggarwal completed a BS in mathematics at

the University of Bombay in 1956. This was followed by a B.Eng. from the University of Liverpool in 1960 and MS and PhDdegrees from the University of Illinois in 1961 and 1964, respec-tively.

COMPUTER46