Perception of 3D scenes from pictures - Purduezpizlo/pictures98.pdfdocs pot cbange percept, which...

14
Perception of 3D scenes from pictures Zygmunt Pizlo and Michael R. Scheessele Department of Psychological Sciences Purdue University West Lafayette, IN 47907-1364 ABSTRACT Bmnelleschi (1413) was the first to demonstrate that a 3D scene can be represented by a 2D perspective picture in such a way that retinal images produced by the scene and the picture are identical (subsequently, Leonardo pointed out that this is true only when the observer's eye is placed at the center of perspectivity that was used to produce this picture). It follows that in the absence of depth cues, the percepts are identical as well. A question arises as to the effect on the percept of viewing the picture from a point different from the center of perspectivity. According to Pireme's (1970) theory, the percept involves taking the cues to the orientation and position of the picture relative to the observer into account, in order to compensate for the incorrect viewing point; when these cues are available, the percept is accurate. We will demonstrate a new visual phenomenon called "cuboid illusion" which contradicts Pirenne's theory. Our experimental results show that the percept of a 3D object from its picture systematically depends on the orientation and position of the picture relative to the observer even in the presence of many cues. Keywords: perspechvity, projectivity, pictures, perception, shape constancy, binocular vision, psychophysics 1. INTRODUCTION Pictures (including motion pictures) constitute an important element of our everyday life. We see them in books, newspapers, magazines, family albums, museums, posters, on television, and in movie theaters. A natural question arises as to the efficiency of pictures as a medium. In other words, does the percept of a 3D scene depicted in a 2D picture agree with the percept of the actual scene? The omnipresence of pictures in our everyday life suggests that pictures are an efficient medium indeed - otherwise we wouldn't use them. The goal of the research presented in this paper was to study the perceptual mechanisms underlying veridical perception of 3D scenes from pictures. The first question that should be answered is whether the human visual system has two separate mechanisms, one for perceiving 3D scenes and the other for perceiving the scenes from pictures. Assuming the operation of two mechanisms may seem quite plausible. After all, pictures are flat (2D) images, and thus are geometrically very different from the real world which is three-dimensional. Since the human visual system evolved without being exposed to pictures, it is quite possible that the efficient perception of pictures requires a leamed mechanism which is different from the natural perceptual mechanism. This issue has been studied by Hochberg and Brooks' in their classical experiment on the nature of perception of pictures. They had tried to prevent a child from seeing pictures for the first 19 months of his life. At the same time, the child had been exposed to and taught the names of a wide variety of toys and other solid objects. On the occasions when the child saw pictures, the objects presented in the pictures were not named and the child was given no instruction. At the 19th month, the child was shown pictures of familiar objects and was asked to name them. The child was able to recogruze the objects from pictures quite easily. The results of Hochberg and Brooks's experiment imply that if human beings had two different mechanisms, one for 3D scenes and the other for 2D pictures, the latter mechanism would have to be innate. The conclusion that perception of pictures involves a separate innate mechanism is, however, hard to accept. Innate mechanisms develop in the evolutionary email: pirlo, [email protected]; WWW: http:lhigbird.psych.purduc.edu Part of the IS&TISPIE Conference on Human Vlslon and Electronic lmaqinq Ill San Jose. Cal~fornla . Januarv 1998 SPlE Vol 3299. 0277-786XJ981510 00

Transcript of Perception of 3D scenes from pictures - Purduezpizlo/pictures98.pdfdocs pot cbange percept, which...

  • Perception of 3D scenes from pictures

    Zygmunt Pizlo and Michael R. Scheessele

    Department of Psychological Sciences Purdue University

    West Lafayette, IN 47907-1364

    ABSTRACT

    Bmnelleschi (1413) was the first to demonstrate that a 3D scene can be represented by a 2D perspective picture in such a way that retinal images produced by the scene and the picture are identical (subsequently, Leonardo pointed out that this is true only when the observer's eye is placed at the center of perspectivity that was used to produce this picture). I t follows that in the absence of depth cues, the percepts are identical as well. A question arises as to the effect on the percept of viewing the picture from a point different from the center of perspectivity. According to Pireme's (1970) theory, the percept involves taking the cues to the orientation and position of the picture relative to the observer into account, in order to compensate for the incorrect viewing point; when these cues are available, the percept is accurate. We will demonstrate a new visual phenomenon called "cuboid illusion" which contradicts Pirenne's theory. Our experimental results show that the percept of a 3D object from its picture systematically depends on the orientation and position of the picture relative to the observer even in the presence of many cues.

    Keywords: perspechvity, projectivity, pictures, perception, shape constancy, binocular vision, psychophysics

    1. INTRODUCTION

    Pictures (including motion pictures) constitute an important element of our everyday life. We see them in books, newspapers, magazines, family albums, museums, posters, on television, and in movie theaters. A natural question arises as to the efficiency of pictures as a medium. In other words, does the percept of a 3D scene depicted in a 2D picture agree with the percept of the actual scene? The omnipresence of pictures in our everyday life suggests that pictures are an efficient medium indeed - otherwise we wouldn't use them. The goal of the research presented in this paper was to study the perceptual mechanisms underlying veridical perception of 3D scenes from pictures.

    The first question that should be answered is whether the human visual system has two separate mechanisms, one for perceiving 3D scenes and the other for perceiving the scenes from pictures. Assuming the operation of two mechanisms may seem quite plausible. After all, pictures are flat (2D) images, and thus are geometrically very different from the real world which is three-dimensional. Since the human visual system evolved without being exposed to pictures, it is quite possible that the efficient perception of pictures requires a leamed mechanism which is different from the natural perceptual mechanism.

    This issue has been studied by Hochberg and Brooks' in their classical experiment on the nature of perception of pictures. They had tried to prevent a child from seeing pictures for the first 19 months of his life. At the same time, the child had been exposed to and taught the names of a wide variety of toys and other solid objects. On the occasions when the child saw pictures, the objects presented in the pictures were not named and the child was given no instruction. At the 19th month, the child was shown pictures of familiar objects and was asked to name them. The child was able to recogruze the objects from pictures quite easily.

    The results of Hochberg and Brooks's experiment imply that if human beings had two different mechanisms, one for 3D scenes and the other for 2D pictures, the latter mechanism would have to be innate. The conclusion that perception of pictures involves a separate innate mechanism is, however, hard to accept. Innate mechanisms develop in the evolutionary

    email: pirlo, [email protected]; WWW: http:lhigbird.psych.purduc.edu

    Part of the IS&TISPIE Conference on Human Vlslon and Electronic lmaqinq I l l San Jose. Cal~fornla . Januarv 1998 SPlE Vol 3299. 0277-786XJ981510 00

  • process of adaptation to the enuironment. But the environment that was present dming the evolutionary process aid not contam picturrs. It is, thetefom, m e Wrely that there is onLy one perceptwah mechanism weieh opmites both in the case ~f solid objects and their flat pictures. This mecb~ism is either innate or learned very early in our life (emting psychophysical results9, snppofl the narivistic exphation).

    The fact that perception of p~ctures and paceptiom of 3D scenes involve the same mechanism, suggests that the eEectiveness of pichms may have some hitations simply because pichues provide less information as compared to 3D scenes. In this paper, we address a question of the extent to which a picture lea& to the same percept of the scene depicted in this picture as does the actual scene itself. We be* witb a brief ovaview of prior research on this issue. Then, we writ a geometrical analysis of viewing scenes from pictures. Tbis analyas leads to pmiictious that me. itlusttatedby a new phenomenon, called euboid illusion. We eonelude by pi'wnting results of a psychophysical experiment te- this phenomenon and implieation6 for theories of monocular and binocutar perception.

    2. PRIOR EXPERINENTS AM) THEOlffTES

    The fht demonsuation of the act that a ZD perspective picfare of a 3D scene may lead to the same percept as viewing the actnal scew,was made by BmeUeschi in 1413.' BnmeUesohi -&an observer with apmspctivepiehlrr of a 3D scene, The observer's eye waa pkced at the pint whicl coincided wirh the center of perspctivity thst was used to praduce this p icm. Under these conditions, the retinalimage in the observds eye produced by tgis pimue was iden6cal to the retinal image that would have beem pmduecd by the scene itself. It follows, that in the absenoe of depth cues, which could inform the o b w e r that the visual rays emanate &om a flat surface, mthu than &om a solid sce-, the w e p t of the seem pmdueed by the picture agreed with the pexcepf of the scene. L a m Leonardo da Vinci pointed oat that d the ob6erver's eye is placed at a poist differant from the center of pexspstivity, the nMal image in the observer's eye is incorrect, i,e. it is different 5om any redoal image that could hive been produce$ by this scene. Under such condinom, according to Leonardo, the percept mast he different. However, isformal observations seemed to contradict Lconanio's conjecbne. Namely, it is Imow thatperspectivemgs can be viewed from a wide range of viewmgpositions, apparently without chmghg the percept Pmkrmore, serne paintings of 3D scenes are proctoced without following the rules of perspectivity, but &spite thw faet the perwpfs producedby fhese paintings seem to agree wirh pezewthat would have been pmducedby the actualscenes.' Perhsps it waa this fund of cmadictory evideneeshowing that a change of v i m posztion docs pot cbange percept, which prevented Leonardo from inventing the stnrtrr~eope (the stenoscope w9s invented three cenNric$ later by Weateone").

    In the b,cginning of the 17th century, Vredmnan de Vries demowhted m a series of compelling examples, that viewkg a w t i v e &awing km an income& vantage pain!, leads to a percept wbieh is quite difiaent from the percept of the scene? 'Ibna, exisling evideraceobteimdftom phmom~tl01ogid obmvati~i~~appearedin~nsistent On the one hand we all know that a painting can be viewed from a number of diffQent vmCage poiats without a c t i n g the percept substantially Cff at all)* yet on the other hand, the perspective drayrings of Vredeman de Vries clearly suggest otherwise.

    Eqwhent91 analpis of this apparent in- has not beam pcrfommd mtil the second part of tttis cwbxy. Pimud, m a series of observations Rndexperimenfs, demomlmtedthat iFa- is viewed from anincoeectvantagepoint and, at the sgme time, the pahiug does not provide any oues to the orientation and Wee of the painting mlative to the observer, t . pentat js m I y affectedby the, ehsnge of the vfdage point. Otherarise, if the pa* pmvides cues to its position andorientation (e.g., through the teotangularframe of the painting), the pmceptid legs affectedby the change of the vantage point To sexplain these nsults, Pimme pmposed a theory in which the percept takes into account the pasitioud orientatiq of the paiating relative to the, obswer. By doing thi$ the o b m e r can correct the pmqt. This theory wuld seemingly mount for prior obsewations where the pmcspt sometimes was and sometimes wasn't affeGted by chasgiug the vantage point whm viewing pictures. It has asto be pow out, however, thatheme didhot stndy h s e effects sy$tematicaUy and his o b s c r v ~ were based on a very mllmMber of examples. Therefore, it is not clear to w k t extent the &sewer can use depth cues m pewqmal eorrectlon fm the m a g vantage point.

    One problem with Pirenne's theory is that evai in the presence of cues to the p o s h and ocimhtion of the pictun, the obs- Eannot cornpn~ (without some adm'tiooal infamation) the. position of the enter of peqectivity rSat was used to produe a given piohm. The w o n is simple: for a given pi-, the painter (or the photogmpher) could dbpt any of

  • the infinite number ofvantase ppint~. The pasition of the cater of perspeciivtty can be computed-from the pic- if the picture contains fslailiar objects (or feabxes)whirhprovide perspective cues. This problem is known in photogammetry as the mema1 camera caIib~~.tionproblem.~ So, Pkenne's theory can& restatedas follows: depthcues arenaededto compute the Euclidean stmctnre on the picture plane from the obsefverers retinal image. This EuctideansIructure is then used to solve hhe external calibration problem (if the picme confains familiar objeeq. An Obvious question ariaes as to wheher the obsetver b s the petcepfual capability to perfew such computations. This question alas addressed in our psychophystcat expenimnt tbnt is reported i~& Seotion 4.

    A ,different theory was proposed by Cuttin&'' Cutting, in a series of ~ i m e ~ n t s , testd the o b s v ' s abitity to dctect non-rigidity of rotating wtanguiar solids shown on a slanred sereen. The e ~ n about the sMt rf the projectien surfade w& reqoVea According to Pirenne'vtheory, the rotating ebjecrs &&Id appear non-rigid due tp the wrong vantage point and unavailability of the information abaut'the slant ofthe projection'scrWi. Cutting de@om@teeaW the obsepve~ could &d& prrcaive &e solids as non-rigd. Based on these .&sults, Cuttiig proposed a theory accoding to which, the obse~er a "fuzzy g+o@eter1' whodoesnot noticethe dietonions produced by viewing pichues ftom a Wong v&mbg point, at Least far some and modemto deparhues of the viewing posioon fmm the center of perspectivity.

    If the deparmre of .the v-e point from the center of ,perspectivity is large, the percept is quite different as corspared to the cap whes the van-e point is Cuttkt g showed that at leasf some of the changes in the percept cao be explained by projective changes in the observer's retinal image.

    To summat&, each of the three theories described above lie., the themy involving depth cues, &zy geometer theory, and p o j d v e thew) can ae~ount for the expenmental redts h t led to the m e n theory. how eve^, none of these tbeories can account for all existing results. hi fact, eachof these theories is inc0116isteutwifh s ~ m e sef of observatiom and experimentalresults. To sbed some ligbton possible sounts of these appureatcontradictions, we presenta systanat~c aoa1ysls of the gwmeay of viewing pictures from a wmng vantage point This analysis includes sow well known faas from projective gametry, as well as same aew thewetical results,

    3. GEOMETRY OF WEWING PICTURES

    Viewmg a 3D scene involvcs a pempective transfoumuon of the scene to the observer's retina. Viewing a pespectmepichue of the scene, involves a product of two pecpctivities one is a perspec&veprnjeation of the 3D sfenem the piohtre plane, and the other is a perspective projectionof the picture to the observer's retina. It f o H m that diff'erenoes betweenviewing an actual scene and viewing a pic- of the scenemay be related to differenoes between a slngle perspective projection and a product of two perspective projectio~~~. Therefore, our analysis will concenrrate on the geomeW of these 'ffaenses.

    We begin with .discussing pictures of planar figrues. Fig. 1 provides an illk~~@tioti for this d&u@ioa Tbe reader is. expected t~ view this picme hrom a large dimnee W m g , the pplane of the pictare oahogonal to ibe b e of sigM The reader's Btinef image of (b) is. a w e , peispsctiveQansP0rm8tion of fa), whenas the image bf fc) is a pewpective . . -tion ~ ~ ofCa] followed by another perspective Wormat ion %t is probably cleat to toe rea@rthatboth[.a) and (by @ut sot (a) &d (c)) l a liLe the sslae figure differently &enred dative to the thebserVer.

    C&der first a trivial of a figure parallel to rhe picture plane. Any penspective image of this figure on the pictnre is identical to the figure up to size waling. As a $silt the piclnre can be viewed any vantage point and the r e W Image in the ob9atver's eye is a perspective image of the original figme. Thus, this case is geometrically equivalent to the case of viewing the actual fi&un when it is slan(ed relative to the observer.

    ABsume now, that the figure 1s slantedrelatiw to the picture plane and the observer's eye is at a point different honr the center of pmpectiviy that ww used to produce this picture. As a result, the observer's retinal image is a produot of IWO perspective Inmfonnations, which is qvivalent to a pm~ectivenansformaton of the original figure This cilse is illutmted in Fig. lc. We next briefly discw a difference between pekspective and projectbe ~ t i ~ n s . ' ' ~ ~ ~

  • Fig. I. Examples of a pempective @) and projechve (c) image.''

    Perspective transformatiow form a subset of projective ~forma'tiolls. Projecrivlty between two planes involves 8 independent panuneters, whereas perspectivity involves only 6. The set of pempeaivifies is not a group: a product of two or more per~pective ~ o m ~ t i o t v ~ is not, in the general case, a pmpective bansfmm@tion. Instead, it is a pxojec*e trsnsfonnation. A projective tnmfonnation is a paspecave transformation only in a very special case, specified in the f&llowing pmpos~tion (this &on is an extension of the weil known theorem for the case of projectivity betweentwo lines):

    Propositton I : A projeotivettansformation betweefltwo differentplanes that interseotata line whichis mapped anto *;elf, is a p"pe&vity.

    Prmf Consider Fig. 2. Rojeeivity between two planes is wquely detemhed by a trsnsfonaation of four poinrs, no three of which an collinaar. Let the planes form an mg4e a whose value 1s &ifmy. Let the points A, B, C and D be projectively irmsformed into points A, B, C' andD', respectively. Call this projective mapping T. We assume here that the h e AB vbich is the interseetion of the two planes is mapped onto itself iu T. Let P be the intersection of line AB rrnd CD. Since P is on the line AB, therefm P maps onto 1tgeK Next, since projectiolty traosfonas straight lines into straight lines, points C', P and D' must be cobear. This implies lhat the Iinea CD aad C'D' intersect (at P). Thos, they are wplanar. It follows that the lines C C andDD' intersect and the immeotion point V is a center of p~~pectivity which kansfontls points A. B, C and D to points A, B, C' and Dr. Call this peqmtivemapping T. Cnemly, the p d u e f T'T1 ~ransformspoints A,B,Caad D onto ~ s e l v e s . Note that the idenWymapping also maps A,B,C, andD onto themselves. Sine the projective mapphg is uniqneIy detembd, and the identity mapping is a projective mapping, T'T' is the identity mapping. Hence, T-T', w W is a perspeaivity m i the cenrer of pzojedou at V.

    h'ote that this pmposition holds for any angk ci betwec-n the two planes. It is easy to show that Qe change of CL results m the obange of the pftion of the canter ofprojectibn. Spacifically,tbk eentcr will move a ulcle m 3D space. We first will &IUOUS&~& this fact f a a lD perspecti~ty. Consider a plane with OJn wordinate system. Let a line z on this plane form an angle a with OX. lFu&smom, let the h e z contain the or?gin 0. Asstrme that is a p q e c t i v i ~ which maps OX to z and that 0 IS mapped onto itselfin this peqmtivify. To iind the center of perspctiviiy V we need

  • \I Fig. 2. Perspectivity between two planes.

    a pair of corresponding points (x,, z,) and (x2. z2) (0 is the third point which is needed to obtain a unique 1D projectivity). It is easy to show that V has coordinates:

    It is seen from (1) that V is on the circle. This circle is symmetric with respect to OX. This is related to the fact that a perspectivity with angle -a is symmetrical with respect to OX as compared to a perspectivity with angle a. It is easy to verify that a line emanating fiom V and going through an arbieary point x, on OX, intersects2 at z, which satisfies the cross- ratio:

    (2) is consistent with our assumption that the perspectivity with center at V maps OX to z. In the case of perspective projection between two planes, x and x', the center of perspectlvity is also located on a circle. This can be seen if one cons~ders a plane n' orthogonal to both planes n and n' such that it contams a center of perspectivity. Changing the angle

    414

  • a betweenn and n' will result m changlng thepos~tion of the center of perspectivity, but this center will stay m n'. So, tlus case reduces to the ID case.

    It follows from Proposition 1 tbat a product of two perspectivities is a perspectivity if the three planes have a common intersection line and this line is mapped onto itself in each of the perspectivities. Consider the case where the centers of projection of the two perspectivities are different - thls is the case relevant to the problem of viewing a perspective picture from a wrong vantage point. Ln such a case, the following is true:

    Proposition 2: Consider two perspectivities: TI, mapping a plane n , to n,, and T,, mapping the plane n, to n,. Let the product T,=T,OT, be also a perspectivity, mapping the plane n , to n,. Let 0,. 0, and 0, be centers of projections in TI, ?; and T,, respectively. Assume that 0,+02. Then, 0,#0,#02. In other words, the center of perspectivity of the product is different from each of the other centers.

    Proof (by contradiction): We prove the inequality O,tO, (the proof of the inequality OItO, is analogous). Assume that 0,=0,. Consider a point A, on n,. Let A, he mapped onto A, on n, in perspectivity T,. This means tbat the points 0,, A, and A, are collinear. Let A, be mapped onto A, on n, in perspectivity T,. This means that the points 0,, A, and A, are collinear. Since 0, and 0, coincide, it follows that A,, A,, A, and 0, are collinear. This means that the perspectivity T, which maps A, onto A, involves projecting lines that go through 0 , . This implies that 0:=0,. This ends the proof, since we assumed that O,tO,.

    The following corollary 1s a direct implication of Proposition 2:

    Corolla~y: If a product of two perspectivities is a perspectivity, then either the centers of the three perspectivities are all different or they all coincide.

    Propositions 1 and 2 have important implications: when the observer is viewing a perspective picture of a planar figure from a wrong vantage point, the retinal image in her eye is either: (i) a perspective transformation of the figure tbat could only have been obtained from a center of perspectivrty different from the point a t which the observer's eye is placed (proposition 2); (ii) a perspective mnsformation of the figure with the center of projection coincidmg with the observer's eye, assuming that the figure had been at a different position andlor orientation relative to the observer (Proposition 1 and equation (1)); or (iii) a projective transformation of the figure. Clearly, only case (ii) represents the situation where the current retinal image, with the center of perspectivity coinciding with the position of the observer's eye, is a valid perspective image of the figure. From Propos~tion 1 and equation 1 it follows tbat there is always a set of such viewing positions and they are all located on a circle. More exactly, since a picture cannot be viewed from behmd, the set of valid viewing positions forms a semi-circle.

    Consider now the case when the range of the object in depth is small as compared to the viewing distance of the person making the picture, and when the range in depth of the picture is small as compared to the viewing distance of the observer. This case happens when the size of the object (and the size of its picture) is small as compared to the viewing distances. In such a case, a 2D perspective projection becomes approximately equivalent to a 2D parallel projection (parallel projection is a perspectivity with the center of projection at intimity - in such a case the projecting lines are parallel). It is known that parallel projection between planes can be adequately represented by a 2D affme transformation. Since affine transformations form a group, a product of two (or more) parallel projections between planes is a parallel projection. This fact implies that for a small region on the picture representing an image of a planar figure, an incorrect vantage point always leads to a retinal image whicb is a valid projection of the figure (note that in the case of parallel projections one should actually speak about valid viewing direction, since the viewing distance becomes irrelevant). The only implication of the incorrect vantage point (or viewing direction) is that the perceivedposition and orientation of the figure will be different &om its actual position and orientation (thus, this case is analogous to case (ii) described in the previous paragraph). Fig.lb is a parallel projection of (a). If the reader keeps (b) at a large viewing distance, slanting the plane of the figure does not change the percept of (b) much - it is still perceivedas (a) under some orientation in 3D space.

    To summarize the case of p l a m figures: we showed that viewing a picture of a figure &om a wrong vantage point may lead to a retinal image that could have been produced by this figure itself, although in a different position and/or

  • orientation (it seems that this fact has been overlooked in the past). This, in turn implies, that viewing pictures of scenes that do not contain conspicuous 3D features, may lead to only wall or moderateperceptualeffectsof changing the vantage point.

    3.2 3D scenes

    We begin by discussing the case of a parallel projection (see the previous section for an explanation of conditions under which a perspective projection reduces to a parallel projection). In this case viewing a picture of a 3D scene is represented by the product of a parallel projection &om the scene to a 2D picture, and a parallel projection from the picture to the obsemer's retina. We show next that viewing the picture of a scene from a wrong vieu;ing direction leads to a retinal image that could never he produced by looking at the scene.

    Consider a point M in a 3D scene and its image m on a picture produced by a parallel projection. Before M is projected to the picture, it canundergo an arbitrary rigid motion in 3D space. This motion is represented by 3D rotation and translation of the scene relative to the painter. Changing the distance between the scene and the painter does not change the image on the picture plane. The other two translations in 3D space lead to translations of the image on the picture plane without changing the image itself (assuming that the viewing directio~ and thus the direction of the projecting lines, do not change). Therefore, the translations in 3D space can be ignored without restricting generality. The remaining rigid motions are rotations in 3D space. These rotations are represented by an orthonormal mabix R,,, with det(R)=l. Parallel projection from the 3D space to the picture plane can be decomposed into an orthographic projection from 3D to 2D (ofihographic projection is a parallel projection with rays orthogonal to the image plane), and a parallel projection from 2D to 2D:

    where A is a 2D aff~ne transfotmation representmg a parallelprojechon betweenplanes, and Po is a 3D to 2D orthographic projection.

    Since the scene is specified in the coordinate system of the painter, we can assume (without restricting generality) that the rotation matrix is an identity matrix: R=13,,. We can further assume that the observer is looking directly at the picture so that the image of the picture is in the center of the obsemer's retina. This assumption does not restrict generality: if the image of the picture is in the periphery of the observer's retina, it is always possible to compute the retinal image for the case when the observer looks directly at the picture (this is h o w n in photogrammetry as virtual came~arotation'~). If the observer is viewing the picture from a correct viewing direction (i.e. the viewing direction of the painter), the observer's retinal image is an orthographic projection of the scene (since the retinal image is produced by the same projecting lines that produced the picture). At the same time, the retinal image is a parallel projection of the image in the painting, and this projection is equivalent to A-':

    If the viewing direction is wrong, then the observer's retinal image m, is a parallel projection A , of the image in the pictnre and this projection is different from A-':

    m1 = AlAP013x# (3

    Let B=A,A. The observer's retmalimage produced by newmg the picture of a scene is a valid image of the scene if and only if there exists a 3D rotatlon R' whch satisfies the followmg:

    . It is seen that @,,, b,,, 0) and @,,, b , 0) are the fust two rows of R'. Note that if R' is a rotation matrix (its columns and

  • rows are vectors of length one), then its thinl mw must be (0, 0, 1). It foUow that 2 R' is a rotstiM matrix then:

    It is clea~ thar rhe ~equireaneafs (7a-d) are evival&t to tbe fact that B is a rot$~?n on a plane. N6@ that A,=BA-'. ?;his implies that the observer's miwl image produced by a pi- of a.sce& is a valid&@i. ofthe ~oene W y when the. viewing dhctionia correct This isthe case be6au$ethe musformation A, 3 e*aJ@ foA4 up to a ~otation!of the image on ttie observerk~mtim this rotationrepregents a rotation of the ob$erver arw &&rm ,axis and tb@ rotation doe8 qot othange the vie* Birectia Thus, If the view& dimdon ia won$,.the 6bmerIs reti4 mge pmduced by a piehue of a 3D scene cauld not have been obtginedby viewing &e scene &elf. InsteBd, the fetiilal @v&$ is an a g e of ,a 3 0 &e ,mnsformstion'of the scene, r a t b e r ~ o f t & scqe itself. A@m lpt&opjafionx Co!xspond to rigid motions, size scaling,, plus stmichirig for toWr&g)the objEct along arbitti@ &ectiow. (We yrant to.point out that the retinal imags, B e any 2D h g e of a 3D scene, could also be obtahd by q y of of h&i@ n-I of nb-affine tsausfomtions of the scene. %pecScally, ow can always move each,pinf of a s&e to @a&&aypgs&n on its prcjeotieg ray, without a@&g tbe image. Sucb a mmfommti'off of a scene Ili,t art m e . nor proj9cWe MS evw tapologld ttam.formation. It is an open queaeon whethex viewing p i c a s from a Wrong viewing position leads the per6qt of sn a f f b or a son-&he formation of the SEW.)

    An a f h e trmsfoni18tion is illtrstr&ted m Fig. 3. Fig. 3a shows a pictme of a cube obtained by an orthographic projection, and Fig. 3b shows a picme of an afthe trmefomtation of a cube. ifthe observer keeps me p b e of the picture orthogonal to the line of sight (I@& posrtion) and the viewing distanoeis larse, then the observer's retinaXmge produced by (a) 2s a valid image of a cube. A d indeed, (a) Iooks like a cube. Howvr, unda simiiar viewing conditions, the re- irnage of (b) could nevei be produd by a cube. And (b) does not look like a cube. If now the observer slanrs the plane of the piamre, the retinal image produced by a shted (a) d l bbe -ely equivalent to tbe retinal image. of @] when (b) is in upright position (slant -m). And indeed, a slanted(a) looks like an uptight (b)* ),er than like a cube.

    Consider now a g m l use of a perspective hansformnation frorn a 3D scme to a 2D pic- foUowed by a p~cti~etmsfonnationfkomthe p a r e to the observer'sre.tina Note that for eacbiu&itesieaaWy d p a n of the scene, the projection from the scene to the picture k a parallel projection. Rowever, s i m the picme was obtained by a perspective projection, different park3 of the stem am projected to the picture by using rays having mfferat directions. From the discussion of the padelprojection, presented in the &st part of this section, it follows that the p-t is valid only when the viewing direction is conect. Since Wentpazfs of the picture determine different (valid) vie* direcrions, therefore there is only one vahd viewing point: the point of intemection of these viewtng directrotls. This point is the center of psrspeetkdly that was used to prodwe tlw pichup. If the viewing point of the h o b s e r v e r does not coincide with this e n t e r of perspectivity, the observer's retimllmsge c d d pot have been obtained by viewing the actual scene. Insftad, this image mnld havebeen pmducedby viewing a 3D projective traasfonnation of the SEW. This will be explained in some more detail next (Similarly as in the csse of parallel pmjeotion, tbe r e W imago could also be obtained by any of an hfbife

    Fig. 3. (a) OIWogqhic bmge of a cube. (b) b)graphic image of an &e ~ f ~ o n of a cube.

  • mm,b@:@f non-Mjedtive ~ f o r ~ n a t i ~ ~ ~ . of the ssene by moving each pbintofa scene ta 6 arbi- gosition on its pmjecfing ray. Such a biansfomtion ofa scene $ not even a ropologiial tw~fbrmation.)

    Consider a poSqt M,- in a 3B scene and its perspecbve image w,, on a p i c m pxodhced with UE cent-% of petspectivity at V (M andm are representeaby homogeneous coo~dh.tes). If the &serverpub the fie atpintV, tke mtetinal image produced by this pic* will be a perspective image of the scene. We can write the foI1owing eqwatiw":

    w&p mat+ P3xc ig ehderizedby 1 1 iaiiqmdentpm.meters of fhe p e ~ s ~ t h e mr&o@mt@n of the 3D. scene to the 223 . . pic-. 6 of&& . m p i s d i?Ae ri@d motion of M hi 3D (extrhwic p-t*). Tbe reuwmg 5 iodepende~t p,arameters are d , e d i n ~ + a u w s e r s tof a .canxerz)and thRy mp'eetn the position of tfie ,center of perspeccvi~ dat ive m the pidtqre, the angle&tww +he i e s of the p'lcftue~eoo~diiuie system, and the ratio of theunits on the i%'o aKes.. we e m that thg 1~ two p-e tw we net:%e and they are equal to 90 d q and 1, lespeweIy. ~f the obse~er's eye is at a pbintiiif&ent from fr, the tran&mtstign &om the phW3 peint m te tk2 image *jut I& on the 0b~eNe1!s retina is -w@d by the following mti011:

    .q = P,m (9)

    *eye P, represeatsa 1R to 2D perspeeaivetransfommim between the pi- and the obsmver's reha. Thetradonwtion from the poist M m the 3D scene to the obsemer's retins is obtamed by combining (8) sad (9);

    rn, = P,PM (10)

    Let P',, = PIP. Since mi = P'M, therefore P1 reptesenfs a paqective tsamfomation, whichpxoduces image m, of the point M of the SD scene. However, as sbom in the pevrous pmgraph, P' is not a pelspecgve twn&omtion which can tmnsfom~the actpsl scene point M to the image point m, on the observer's retina. Tn other words, P* repnsems a camera w h w imins1c pasametware different £tom those. of fAe obsenter'seye. L.et the p e r s p e e t i v e ~ o ~ o n f m m a 30 scene to the observer's eye be mpmsated by P,. The edsb a K,,, represfatiug a 3D projective transfarmatian, mchthar:

    Such K exists beceUse it can be chosen as a solutim oE

    P' = P d

    (12) represents a ser of 11 in- linear equations with 15 ~ ~ U O W I L E (elements of K). Clearly, there. exists an %mite number of such mahices K.

    So. we &wed thal v i a a peispective pictore of a 3D scene 6om a wrong vantage point (i.e. fmm a point diffemm a t h e center ofpeqm&& that was used to this pic-) is eqmvalent to viewing a projedve &ortion of the m e (but see footnote 3). 3D pmjective transformation affects the sbape of the object with % de&rees of freedom As a muh, a ~ r a n ~ p a r a U e t e p i p e d & b e nansformedto a non-rectanguGpolyhedmd Fig. 4 sbowsixamples of such a trandomatioe Fig. 4a is a perspectrae image of a cube. If the reader keeps the pi- io the frontal plane and the vie- di6bce is abut seven times the size of the imagee the pmept coxresponds to a & (if the mbe is reversed in d@ the percept does not c e to a cube)). Fig. 4b shows how the retinal image of (a) looks like when (a) is viewed at a slant Cleady, the plybdmn shown m @) does not look like a ah. mead, it looks like a compnssed cube. The resder is encoiuagedto slant the plan@ of the figure and b k at (a). l%e percept af Wed (a) is similar to the percept of ap upright (b). Another example is shown in F*. 4c. If the reader looks at this figure by keeping the +DX m the frontal plane, the reader's eye is at a wrong vantage point As a result tbe retinal image is an k g e of a 3D projective mnsfonaation of a cube, This example shows thaS pnjective kausfounatiion can change the appearancaof the object quite dmwtioally.

    To summarize, looking at a pietme of @ 3D scene fmm a wrong vantage point leads lo a refinal image which could

  • Fig. 4. (a) A pRspechve image of a Cube. (b) A perspective image of (a) (e) An image of a projechve t m ~ ~ f ~ n n a t ~ o n of a cube.

    never be obtained by looking at the scene. These effects are less dramatic, if the scene is approxunately planar. It follows from tbese cansiderations that testmg of the effem of s viewmg poslhon on the percept should be performed anth 3D scenes that provide strong 3D cues. Otherwise, the measured effects could be weak (if present at all), sunply because in the case of planat scenes, thexe are always an infinite number of valid v i e w g positions. We decided to use pictures of cubes, since it is h o r n that human observets can easlly recognize a cube itom its image and this recognition is very 1eliab1e.'~

    It is commonly acceptedthat perceptualinterpretation of the picture of a cube involves the operation of a minimum principle. Specifically, if the Image of a tnhedd angle could have been produced by a right trihedral angle, then this angle is perceived as such." This assumption IS related to the reciprocal of the maximum of the compactness of a polyhedmq or the reciprocal of the maximum of simphc~ty of the 3D mterpretation", or the minimum of the vanance of angles a lengths

    I of the edges of the p~lyhedron.'~'~ Fig. Za shows an orthographic Image of a cube. If the plane of the figure is slanted, the percept is systematically

    affectedby the slant. The changeof the percept seems to correspond to the process of mterpretingtheretinal image according to Perkins's laws." Since f a non-zem slants the vantage pomt is wrong, the obse~~er's retinal image is not a valid image of a cube. Instead it is an image of an aEne itansformarion of the cube (more exactly, projective transformation). Apparently, theit? is little compensation for the wrong vantage point. This effect of the slant on the percept produced by a plctme of a cube, is called a "cuboid illusi~n".~' This phenomenological o b m t i o n was tested in a psychophysical e-entthat is presentednext. In this experimenf we used, as smmli, mt only piclures of cub01&, but also planar figures - parallelognuns. It is possible that wewing pictures at a slant leads to perceptual em,= not because of tbe fact that the vantage point is wrong, but because the human observers donot have perceptual ability to recoIIstrnct dep!b relations. Using planar figures as stimuli will allow separatmg these two posslbhties because viewing a slanted figure involves only one perspective projection, rather thgn two, as in the case of nearing slanted pictures of 3D objects.

  • &&&. Three subjeots were tested including one of the authors (ZP). ZP and MSG were experienced as subjecrs in psycho$ysical experiments. ZP receivede%temive practice with the stimuli and the method as used in the experiment 2% and MSG didnot receive such practice. ZP and ZM were myopes and they used their mrmal correcting glasses. MSG was an emmetrope. ZM was naive with respect to the hpothmss being tested

    &&&. A set of 11 pictures of d e i d s and a set of 11 parallelogmms were used as standard shapes The middle picture (#) in the set of cuboids wasan oxthographic pmjection of a cube(see Fignre la). The ~emainmgpicaues of cuboids wete obtainedby changing the height (aspect ratio) of the mddle picture. The smallest aspectratio @ictme-#l) was mice as small as compared to that of the middle p W e and the w e s t aspect ratto @xcture #11) was twice as large as oompamd to $hat of the middle pic-. The aspect ratio of the stimuli formed equal steps on a loga&hmic scale. Thus, the ratio of heights of the stimuli with comwtive numbem was abut 15%. The set of parallelogram was obtained by drawing the cofitours repre6enti.g the top faces in the picmes of the cuboids. By doing this it was ensnted that parallelogmm did not provide mow i n f o d o n thsn wboids, and, thereforee any snperiority in performance inthe case of parallelogram as compared to the pic- of cuboids eodd mt be attdbuted to daperences mong the s thdi . Two cJzes of the stimuli were used: small (the size of the base of the parallelugrarn equal to 1.67 cm) and large (the size of the hase of the parallelogram equal to 5 em - i.e. three times as large as the d stimuli). The ratio of the range in depth to the mewing dmaace (50 cm) was from .01 to .09 m the caseof small SimG and this r a h was in the range from .03 to .28 For largestimul'i Thus, the set of small stimuli gave rise to r&nd images that weze approximately parallel projeotiotls of the sthuli. The set of large stimuli, on Ube otkerhaud gaverlse to images that werepespeetivepmjections of the stimxlli and theseprojeetiomwere clearly diffefent fmm parallel projectiou The standard stimuli weremoun836 on a dquare base whose side was 25.4om. The comparison stitauli were displayed on a computermoniter. The distance of the standard stimulus from t k subject's eye was SO em and the distance of the comparison stimulus S%a the subject's eye was 200 em. The eqe&cat was performed in a highly ilEuminate.d room (the l-ce of the light reilmtedfrmn wbite paper was 120 c&m5). The I\Eminaace of the contour of the wmpadwn stimulus was 132 cd/ma. The bmhameof the b&ound of the q a r i s o n stimatoS was 0.8 cum2.

    P r o c k e . The subject's iask was to aaust the aqectratio of the camparigon shape so that it matched the aspect mtio of the statldard shape. T h e slants of the standard shape we= used: 5 4 6 0 and 70 deg (slant is the angle between the h e of sight and the normal to the sbape; thus, the frontal plane has slant zero). The sfaadd shapes were always slanted with tilt 90 deg (LC. they were &G& up and roared am& their horizontal sidz). Note that thc tilts, sla~tts, and distaaces were not m & y the same fox each eye. The valnes given here are for the left eye. The d c e of the computer monitor was orthogonal to the line of $ght of the snbject'sleff eye. The viewing was monocuIar (leff eye) or binocular. The head was stati~nary (supparted by a chin-forehead mi). Pn each session a set of five srandard shapes dram out of the pool of l l shap~swasused Thereweteat~of3possiblese~switfirurmbas: 1,3,5,7and9; 24,48and10;3,5 ,7 ,9andl l . The shaped used in a given session were easy co discriminate (the ratio of height6 of two successive shapes was 1.32). The assignment of sen to sesicms was d m . The abject did not know which set was used in a given session. Each- began by m h m l y genaating a amparkon shape with the height from fhe muge 0.3 to 3.0 of the height of the middle shape (%). In the beginning ofeaehsdon s e d pradca trials w m used. Tben were eight sessions: m o n o ~ v s binocular, piotures of cuboids w pdlelbgra5S, d Vs large stimuli (MSO was tested only with large s6muli). Eadh subject wss te3ted m a m r m t orda of sessions. The eqeimentbegananth testing ZP with small s thu l i and MSG with large ainatli. In all of these sessiom each s h n h at a slant was &own 10 times, giving rise to 150 trhh in a session (5 stimuii x 3 stants x 10 trials). lt was found that the subject's responses were quite stable across replieat&w. Therefore, in the rest of the erperiment, gach stimalus at a slant was shown only 5 times, giving xise to 75 trials in a session. The order of sfimuli and slams was r a n w e d . . &&&. The adjusted aspectdo of the comparisrm shape was divided by the aspectratio of the h e d shape. Thia raho of the two aspectratios will be calhtdheneefoah"adjustedaspcctration and will be used as a dependent variable representmg the subject's wept. If the comp&on shape is identical to the standard sbape weat shape verididtyl, the adjusted aspect ratio is equal to 1.00. If, on the other hand, the comparison shape is identical to the retinal image of the standard&ipe (zero shape constancy), the adjusted aspect ratio would be w x i m a t e l y equal to cosrslnnt). The remits were plotted by

  • putting the adJmed aspect ratio on the ordinate and slant on the absciwa. Thw, data pointshving wdinate equal to 1.00 (horizontal dotted h e ) represent shape vddicality. Data pints located on any horiwotal line represent shape constancy. Tae dashed b e represents cos(siant) and it corresponds to zero shape wnslwcy.

    Fig. 5 shows results fromall three subjects. %ch data point tepresents a mean adjusted aspectratio from 25 (or 50) trials. The symbol sEe is equal to two standard errors of themeen. First, p e r f o ~ ~ e in the m e of picWes of cuboids was clearly poorer than that in the c85g of p ~ e ~ . IXs was me in a l l conditions: m a l l and large stimuli for both monocular d binocular mewmg. The percept of the pschtre8 of cuboids was from veridical and it was strongly affected by slant (i.e. the pereept was not wastant). Second, lar$e stimuli led to mme accurate percept Third, binocular viewing led to more vetidical and more constant percept as coatpared to monocular viewing. m~hese results cannot be fully explainedby - aay of %he kaditiional theeries. Clearly, viewing a p~etare of a cuboid from an komctvantag~poiat leads to a non-veridical percept even in the presence of many cues to the position and onentation of the picmre. This is inconsistent with P~enne's theory." The percept ig not wns~stent with projectbe theory, either. Accordmg to this theory, the subject's percept should be predicted from the projected imsge. In such a case, the data points would be locared on (or close to) the dashed line rqpsenting wsIsIant). ?bey are not. Finally, a c c w k t e the* gaomertheory, the percept should be unaffeatedby slant, at la& for small or raoderate slant In onr eqaismt the percept of a cuboid was affected by slam at all slant values m d Note, however, that Me sIant valaes wed in ow expmimentwere la*. 'l'herefore, it is possible that our experiment did not redly test this parhcuh theory It IS qwte posible that for V ~ E Y small slants fe.g., up to 20 deg) the perceptwould be vendkal.

    Wepresent now a new explanation of our, as well as prior, rewlts on v i e w pictures &om a wmng vantage pomt. &st note that a picture, as any 2D image of a 3D scene, derennioes an &f& number of potxible interpretations. Tharfore, the visual system hElS to use m e constraints in order to choose one (possibly c m c t ) inrerpretatioe If the scene deplcted in tlre image is familiar, then it may be this famikity whieh can constrain the problm This is a well known pilenomenon of shape constancy." This pheaomenon was tested in our eqmhenf: the unknown slanted standad shape was matched by known (upright) wcompimson shape, Notethat f e t y was an effffihve constraitn only in fhe case of larse stimuli in the case of smell slim* perspective projeation reduced to parallel projeetiou and in such a case any of the comparison stimuIi couldpmduce a givenretinal islage. As a w t * famikity did not provide a wefd constraint Ia the absence oYknihity, the visual sysystem uses some a priori constmints like Perkins's laws." In the case of p k figures (faces) these c o d t s usually do nat operate (the figure is planar, it does not have Wihedral angles). ki the case of p a r e s of solid objects, however, these conskaint do opelate. This was clearly seen in the case of small cuboids viewed monocularly. In snch a case, these a priod cons@aints are probably the only ones that V t e . As a re* the effect of slant is s-gest. Specifically, the percept is very close to what would be predictedby applying Pcrkios's laws to the retinal inlage: the data points are close to the t i representing cos(si0nt). If a pnori comwints are operahg against the unstraint of f ' t y with the pictpres of (large) oubaids, the percept is closer to veridicd although it is still fw from it. Pinally, one can prowde a d d i t i d wmtmkt by using bin& yfewmg. If bkmdar viewing operates together with f e t y constraint (large faces) the pmcept is close to veridical and is carrstast. If bimdady eongkaint operates together with f a d h i t y , and against a priori c~mtmints (large cuboids), the percept k closer to veridtd as &mpad to th case of monocular viewing of large caboids, but 1s poomtban in the case of binomkc viewing of large faces. (Note &at eutaiide a hboyatny all thee types oficomtraints d y operate together, which leads to m t e and constant percept)

    Acunding to om explanation, b iwah vision is not a completely different mode of viewing as compared~m mo1~4il1vkitm -ad, it is a mode whloh happens to provide m m comaainW for visual i n t ~ t a t i o u Thus, it makes the p-t closer to veridical and consram i l i n d u viewing does not overridi: monoculac cues completely. if this were the case, binocularpeseept of a slanted picton ofa d i d would not depend on slant But it does. It 4 worth pdmting out that otu exghanatioa which assumes that v i s . petception~~lgists in finding a mique perceptin the presence of comlmiats, r -esen ts an approach to visual perception in which the pereept is considaed to be a solmion to an inverse problem."u Since invem problems are o h ill-posed, a priori (and any other) rmmahfs are cmcial in obtaining a w q u e and stable intepvtatitm.

  • J. Hodberg and V. Brooks, "PietorEal recognirion as an lmleamed abikty: a &dy of one chitd's prfonnance", Am. J. Psychd 75, pp. 624-628, 1962. A. Slate1 an8 V. Mmison, "Shape comtancy and stwt perception atbirth", Perception 14, pp. 337-344, 1985. M. Kemp, The science of h, Yale Uolv. Fms, N e w Haven, 1990. M Kubovy, l%eprycho~ogy ef psrspeciive and renabsance art, Cambridge Univ. Pteso, CamWdge, U5, 1986. C. Wheatstone, "Cont?ibutiom to the physiology of vision -pan 1" Pkiio6. Trans. A w l Sac. (7&ndon) 12% pp. 371-394, 1838 M E. Pireme, Optics, pain6ng andphotography, Cambridge Udv. Ptess, New Yo& 19TO. R. M. Haralick and L. Q. Shapiro, C o m ~ r e r mtd rob& vision, Ad&m-Wesley, New Yo& 1993. J. E. Cutting, "Rigidity in cinema seen &om the &nt row, side aisle", J. Exp. Psytk.: ttwnan Perc; Perfom. 13, pp. 323-334, 1987. E. R. Goldstem, "Spatial layout, onentahonrelative to the observer, and perceived projection m pictares viewed at an w~le", J. E p . Psych.: IIUDmt Perc Pdonn. 13, pp. 256264 1987. J, E. Cuning, "A&e dimttio~s of piad1 space: some pIedicti0ns for Goldstein (1987) &at La Goumene (1859) might have made", J. Exp. Psy6koI.: .&nun PPBIC Petfann. 14, pp. 305-3 11, 1988. Z. &lo, "A thetheory of shape constancy based onpempeetive invariant$', Vision. Res. 34, pp. 1637-1658,1994. 2. M o , A. RosenMd, a d I. Weiss, "The geometry of visual space: about the m o o m p a h i i between science and mathematics, Compuier Tiwn and Image Undersmding 65, pp. 425433, 1997. 2. Pizlo, A. Rosenfeld, and 1. We&, 'Visual space: mat&mahcs, engineering and Science", Compuren Vmmn and 1-e Wrtrleruing 65, pp. 450454,1997. M. Raaaani, "Cvns€raints on length and angle", Conrpurer ViF. Graph. bnsge h c . 41, pp. 3842, 1988. 0. Fangeras, Xhree-dimemi~naI GOtnpU&r e o n ? a geomentc vimpint, MIT h s , Cambridge, 1993. 2. Pido andM. Salach-Golyska, "3D shape peption", Percept. Psychopy& 57, pp. 692-714, 1995. D. N. Per- ''Vhal dismimhtion bewen remsngularand n d ~ t a n ~ pdeiopipeds", Percept P&ychophys. 12, pp. 396400, 1972. J. Hochberg and V. Woks, "ISre psychophysics of foml: leversible-perspective drawings of spatial objects", Am. J. Psychal. 73, pp. 337-354, 1960. P. Aitneave and R Frost, 'The detmination of perceived Bidimemional orientation by mrniantm cdteria", Percept. Psych~pphys. 6, pp 391-396, 1969. Y. G. Lecterc and M. A. Eisehla, "An op-ation-based approach to $he berpta t iw of single liae drawings as 3D wire franw': IWenuttt I. Cbnpyter Vb. 9, pp. 113-136, 1992 Z. Pielo and U Sabch-Goly* " & l o n e , bincedar a d multiview perception", Invest Ophthdm. Vzs. Sci 35, p. 1916, 1994 (Ah-). A. N. Tikbonov and A. V. Goncbarsky, ~-po~pedprobIem in the -al sciences, MIR PubWms, Moscow. 1987. T. Poggio, V. Torre, and C. Koch, "Compntatid vision and teguladzation&eory", Nature 317, pp. 314.319, 1985.

    Scan0001.jpgScan0002.jpgScan0003.jpgScan0004.jpgScan0005.jpgScan0006.jpgScan0007.jpgScan0014.jpgScan0008.jpgScan0009.jpgScan0010.jpgScan0011.jpgScan0012.jpgScan0013.jpg