Image Mosaicing for Tele-Reality Applicationswhitaker/classes/image_proc/crl-94-2.pdfreality in this...

ImageMosaicingfor Tele-Reality

ApplicationsRichardSzeliski

Digital EquipmentCorporation

CambridgeResearchLab

CRL 94/2 May, 1994

Digital Equipment Corporation has four research facilities: the Systems Research Center and theWestern Research Laboratory, both in Palo Alto, California; the Paris Research Laboratory, inParis; and the Cambridge Research Laboratory, in Cambridge, Massachusetts.

The Cambridge laboratory became operational in 1988 and is located at One Kendall Square,near MIT. CRL engages in computing research to extend the state of the computing art in areaslikely to be important to Digital and its customers in future years. CRL’s main focus isapplications technology; that is, the creation of knowledge and tools useful for the preparation ofimportant classes of applications.

CRL Technical Reports can be ordered by electronic mail. To receive instructions, send a mes-sage to one of the following addresses, with the word

�� in the Subject line:

On Digital’s EASYnet: CRL::TECHREPORTSOn the Internet: [email protected]

This work may not be copied or reproduced for any commercial purpose. Permission to copy without payment isgranted for non-profit educational and research purposes provided all such copies include a notice that such copy-ing is by permission of the Cambridge Research Lab of Digital Equipment Corporation, an acknowledgment of theauthors to the work, and all applicable portions of the copyright notice.

The Digital logo is a trademark of Digital Equipment Corporation.

Cambridge Research LaboratoryOne Kendall SquareCambridge, Massachusetts 02139

TM

ImageMosaicingfor Tele-Reality

ApplicationsRichardSzeliski

Digital EquipmentCorporation

CambridgeResearchLab

CRL 94/2 May, 1994

Abstract

While a largenumberof virtual reality applications,suchasfluid flow analysisandmolecular

modeling,dealwith simulateddata,many newer applicationsattemptto recreatetrue reality as

convincinglyaspossible.Buildingdetailedmodelsfor suchapplications,whichwecall tele-reality,

is a major bottleneckholding back their deployment. In this paper, we presenttechniquesfor

automaticallyderiving realistic2-D scenesand3-D texture-mappedmodelsfrom videosequences,

whichcanhelpovercomethis bottleneck.Thefundamentaltechniqueweuseis imagemosaicing,

i.e., the automaticalignmentof multiple imagesinto larger aggregateswhich are then usedto

representportionsof a 3-D scene. We begin with the easiestproblems,thoseof flat sceneand

panoramicscenemosaicing,andprogressto morecomplicatedscenes,culminatingin full 3-D

models.We alsopresenta numberof novel applicationsbasedon tele-realitytechnology.

Keywords: imagemosaics,imageregistration,imagecompositing,planarscenes,panoramic

scenes,3-D scenerecovery, motionestimation,structurefrom motion,virtual reality, tele-reality.

c�

Digital EquipmentCorporation1994.All rightsreserved.

Contents i

Contents

1 Introduction �� 1

2 Related work �� 2

3 Basic imaging equations �� 3

4 Planar image mosaicing �� 5

4.1 Local imageregistration �� 6

4.2 Globalimageregistration �� 8

4.3 Results �� 8

5 Panoramic image mosaicing �� 9


6 Scenes with arbitrary depth �� 11

6.1 Formulation �� 13


7 Full 3-D model recovery �� 14

8 Applications �� 18

8.1 Planarmosaicingapplications �� 18

8.2 3-D modelbuilding applications �� 19

8.3 End-userapplications �� 19

9 Discussion �� 21

10 Conclusions �� 22

A 2-D projective transformations �� 28

ii LIST OFFIGURES

List of Figures

1 Rigid, affine,andprojective transformations�� 4

2 Whiteboardimagemosaicexample �� 10

3 Panoramicimagemosaicexample(bookshelfandcluttereddesk) �� 12

4 Depthrecovery example:tablewith stacksof papers �� 15

5 Depthrecovery example:outdoorscenewith trees �� 16

6 3-D modelrecoveryexample �� 17

1 Introduction 1

1 Introduction

Virtual reality is currently creatinga lot of excitementand interestin the computergraphics

community. Typical virtual reality systemsuseimmersive technologiessuchashead-mountedor

stereodisplaysanddatagloves. The intent is to convinceusersthat they areinteractingwith an

alternatephysicalworld, andalsooften to give insight into computersimulations,e.g.,for fluid

flow analysisor molecularmodeling[Earnshaw et al., 1993]. Realismandspeedof renderingare

thereforeimportantissues.

Anotherclassof virtual reality applicationsattemptsto recreatetrue reality asconvincingly

as possible. Examplesof suchapplicationsinclude flight simulators(which were amongthe

earliestusesof virtual reality), variousinteractive multi-playergames,andmedicalsimulators,

andvisualizationtools. Sincethereality beingrecreatedis usuallydistant,we usethetermtele-

reality in this paperto refer to suchvirtual reality scenariosbasedon real imagery.1 Tele-reality

applicationswhich could be built usingthe techniquesdescribedin this paperincludescanning

a whiteboardin your office to obtaina high-resolutionimage,walkthroughsof existing buildings

for re-modelingor selling,participatingin a virtual classroom,andbrowsing throughyour local

supermarketaislesfrom home(seeSection8.)

While mostfocusin virtual realitytodayis oninput/outputdevicesandthequalityandspeedof

rendering,perhapsthebiggestbottleneckstandingin thewayof widespreadtele-realityapplications

is theslow andtediousmodel-building process[Adam,1993]. This alsoaffectsmany otherareas

of computergraphics,suchascomputeranimation,specialeffects,andCAD. For smallobjects,

say0.2–2m,laser-basedscannersareagoodsolution.Thesescannersprovideregistereddepthand

coloredtexturemaps,usuallyin eitherrasteror cylindrical coordinates[CyberwareLaboratoryInc,

1990;Rioux andBird, 1993]. They have beenusedextensively in theentertainmentindustryfor

specialeffectsandcomputeranimation.Unfortunately, laser-basedscannersarefairly expensive,

limited in resolution(typically 512 256pixels),andmostimportantly, limited in range(e.g.,they

cannotbeusedto scanin abuilding.)

The image-basedrangingtechniqueswhich we develop in this paperhave the potential to

overcometheselimitations. Imaginewalking throughanenvironmentsuchasa building interior

andfilming a videosequenceof whatyou see.By registeringandcompositingthe imagesin the

1Theterm telepresenceis alsooftenused,especiallyfor applicationssuchastele-medicinewheretwo-wayinter-

activity (remoteoperation)is important[Earnshaw et al., 1993].

2 2 Relatedwork

video togetherinto large mosaicsof the scene,image-basedrangingcanachieve an essentially

unlimited resolution.2 Since the imagescan be acquiredusing any optical technology(from

microscopy, throughhand-heldvideocams,to satellitephotography),the rangeor scaleof the

scenebeingreconstructedis not an issue. Finally, asdesktopvideo becomesubiquitousin our

computingenvironments—initiallyfor videoconferencing,andlaterasanadvanceduserinterface

tool [Gold,1993;RehgandKanade,1994;BlakeandIsard,1994]—image-basedsceneandmodel

acquisitionwill becomeaccessibleto all users.

In this paper, we develop novel techniquesfor extractinglarge 2-D texturesand3-D models

from imagesequencesbasedon imageregistrationandcompositingtechniquesandalsopresent

somepotentialapplications. After a review of relatedwork (Section2) andof the basicimage

formationequations(Section3), we presentour techniquefor registeringpiecesof a flat (planar)

scene,which is thesimplestinterestingimagemosaicingproblem(Section4). We thenshow how

thesametechniquecanbeusedtomosaicpanoramicscenesobtainedby rotatingthecameraaround

its centerof projection(Section5). Section6 discusseshow to recover depthin scenes. Section

7 discussesthemostgeneralanddifficult problem,thatof building full 3-D modelsfrom multiple

images.Section8 presentssomenovel applicationsof thetele-realitytechnologydevelopedin this

paper. Finally, weclosewith acomparisonof ourwork with previousapproaches,andadiscussion

of thesignificanceof our results.

2 Related work

While 3-D modelsacquiredwith laser-basedscannersarenow commonlyusedin computerani-

mationsandfor specialeffects,3-D modelsderiveddirectly from still or videoimagesarestill a

rarity. Oneexampleof physically-basedmodelsderivedfrom imagesarethedancingvegetablesin

the“Cookingwith Kurt” video[TerzopoulosandWitkin, 1988]. Another, morerecentexample,is

the3-D modelof abuilding extractedfrom avideosequencesin [Azarbayejanietal., 1993],which

wasthencompositedwith a 3-D teapot.Somewhatrelatedaregraphicstechniqueswhich extract

camerapositioninformationfrom images,suchas[GleicherandWitkin, 1992].

2The traditionaluseof the term imagecompositing[PorterandDuff, 1984; Foley et al., 1990] is for blending

imageswhicharealreadyregistered.We usetheterm imagemosaicingin this paperfor ourwork to avoid confusion

with this previouswork,althoughcompositing(asin compositesketches)alsoseemsappropriate.

3 Basicimagingequations 3

Theextractionof geometricinformationfrom multiple imageshaslongbeenoneof thecentral

problemsin computervision[BallardandBrown,1982;Horn,1986]andphotogrammetry[Moffitt

and Mikhail, 1980]. However, many of the techniquesusedare basedon featureextraction,

producesparsedescriptionsof shape,andoften requiremanualassistance.Certaintechniques,

suchasstereo,producedepthor elevationmapswhich areinadequateto modeltrue3-D objects.

In computervision, the recoveredgeometryis normally usedeitherfor objectrecognitionor for

graspingandnavigation(robotics).Relatively little attentionhasbeenpaidto building 3-D models

registeredwith intensity (texture maps),which arenecessaryfor computergraphicsandvirtual

reality applications(but see[Mann, 1993] for recentwork which addressessomeof the same

problemsasthispaper.)

In computergraphics,compositingmultiple imagestreamstogetherto createlarger format

(Omnimax) imagesis discussedin [GreeneandHeckbert,1986]. However, in this application,

therelativepositionof thecameraswasknown in advance.Theregistrationtechniquesdeveloped

in this paperarerelatedto imagewarping [Wolberg, 1990]sinceoncethe imagesareregistered,

they canbewarpedinto acommonreferenceframebeforebeingcomposited.3 While mostcurrent

techniquesrequirethe manualspecificationof featurecorrespondences,somerecenttechniques

[Beymeret al., 1993]aswell asthe techniquesdevelopedin this papercanbeusedto automate

this process.Combinationsof local imagewarpingandcompositingarenow commonlyusedfor

specialeffectsunderthegeneralrubric of morphing[BeierandNeely, 1992]. Recently, morphing

basedon z-buffer depthsandcameramotion hasbeenappliedto view interpolationasa quick

alternativeto full 3-D rendering[ChenandWilliams, 1993]. Thetechniqueswedevelopin Section

6 canbe usedto computethe depthmapsnecessaryfor this approachdirectly from real-world

imagesequences.

3 Basic imaging equations

Many of theideasin this paperhave their simplestexpressionusingprojectivegeometry[Semple

andKneebone,1952]. However, ratherthanrelying on theseresults,we will usemorestandard

methodsandnotationsfrom computergraphics[Foley et al., 1990], andprove whatever simple

3Onecanview theimageregistrationtaskasaninversewarpingproblem,sincewearegiventwo imagesandasked

to recover theunknown warpingfunction.

4 3 Basicimagingequations

Figure1: Rigid, affine,andprojective transformations

resultswerequirein AppendixA. Throughout,wewill usehomogeneouscoordinatesto represent

points, i.e., we denote2-D points in the imageplaneas �� , with �� being the

correspondingCartesiancoordinates[Foley etal., 1990]. Similarly, 3-D pointswith homogeneous

coordinates��! "�� have Cartesiancoordinates��#��! $�� .Usinghomogeneouscoordinates,wecandescribetheclassof 2-Dplanartransformationsusing

matrixmultiplication%&&&' �"(�"(�)(* +++,.- %&&&' / 00

/01

/02/

10/

11/

12/20

/21

/22

* +++, %&&&' ��* +++, or 0 ( -21 2D 03� (1)

Thesimplesttransformationsin thisgeneralclassarepuretranslations,followedby translationsand

rotations(rigid transforms),plusscaling(similarity transforms),affinetransforms,andfull projec-

tive transforms.Figure1 showsasquareandpossiblerigid, affine,andprojectivetransformations.

Rigid andaffinetransformationshave thefollowing formsfor the 1 2D matrix,

1 rigid-2D - %&&&' cos4 5 sin 4 6�7sin 4 cos4 6�8

0 0 1

* +++, � 1 affine-2D - %&&&' / 00/

01/

02/10

/11

/12

0 0 1

* +++, �with 3 and6 degreesof freedom,respectively, while projectivetransformationshaveageneral1 2D

matrixwith 8 degreesof freedom.4

In 3-D, we have the samehierarchyof transformations,with rigid (6 degreesof freedom),

similarity (7 dof), affine(12dof), andfull projective(15dof) transforms.The 1 3D matricesin this

4Two 1 2D matricesareequivalentif they arescalarmultiplesof eachother. Weremovethisredundancy by setting922 : 1.

4 Planarimagemosaicing 5

caseare4 4. Of particularinterestaretherigid (Euclidean)transformation,; - %'=< >?"@1

*, � (2)

where < is a 3 3 orthonormalrotationmatrix and > is a 3-D translationvector, andthe 3 4

viewing matrix A -CB ˆA ?ED - %&&&' 1 0 0 0

0 1 0 0

0 0 1 �GF 0

* +++, � (3)

which projects3-D pointsthroughtheorigin ontoa 2-D projectionplanea distanceF alongthe axis[Foley etal., 1990]. The3 3 ˆA matrixcanbeageneralmatrix in thecasewheretheinternal

cameraparametersareunknown. ThelastcolumnofA

is alwayszerofor centralprojection.

Thecombinedequationsprojectinga 3-D world coordinateH - ��! "�� ontoa 2-D screen

location 0 - ��"(I��J(K��L(K� canthusbewrittenas0 - AM; H -N1 camHO� (4)

where 1 cam is a 3 4 camera matrix. This equationis valid even if the cameracalibration

parametersand/orthecameraorientationareunknown.

4 Planar image mosaicing

The simplestpossiblesetof imagesto mosaicarepiecesof a planarscenesuchasa document,

whiteboard,or flat desktop.Imaginethatwe have a camerafixeddirectly over our desk. As we

slidea documentunderthecamera,differentportionsof thedocumentarevisible. Any two such

piecesarerelatedto eachotherby a translationanda rotation(2-D rigid transformation).

Now imaginethatwe arescanninga whiteboardwith a hand-heldvideocamera.Theclassof

transformationsrelatingtwo piecesof theboardis moregeneralin this case.It is easyto seethat

thisclassis thefamily of 2-D projective transformations(just imaginehow asquareor grid in one

imagecanappearin another.) Thesetransformationscanbecomputedwithout any knowledgeof

theinternalcameracalibrationparameters(suchasfocal lengthor opticalcenter)or of therelative

cameramotion betweenframes. The fact that 2-D projective transformationscaptureall such

possiblemappingsis abasicresultof projectivegeometry(seeAppendixA.)

6 4 Planarimagemosaicing

Giventhisknowledge,how dowegoaboutcomputingthetransformationsrelatingthevarious

scenepiecessothatwe canpastethemtogether?A varietyof techniquesarepossible,somemore

automatedthanothers.Forexample,wecouldmanuallyidentify fourormorecorrespondingpoints

betweenthe two views, which is enoughinformationto solve for theeightunknowns in the2-D

projective transformation. Or we could iteratively adjustthe relative positionsof input images

usingeithera blink comparatoror transparency [Carlbomet al., 1991]. Thesekinds of manual

approachesaretoo tediousto beusefulfor largetele-realityapplications.

4.1 Local image registration

Theapproachwe usein this paperis to directly minimize thediscrepancy in intensitiesbetween

pairsof imagesafterapplyingthetransformationwe arerecovering. Thishastheadvantageof not

requiringany easilyidentifiablefeaturepoints,andof beingstatisticallyoptimaloncewearein the

vicinity of thetruesolution[SzeliskiandCoughlan,1994]. Moreconcretely, supposewedescribe

our2-D transformationsas� (P - /0 � PJQ / 1 � PRQ / 2/6 � PRQ / 7 � PJQ 1

�� (P - /3 � PRQ / 4� PRQ / 5/6 � PRQ / 7� PRQ 1

� (5)

Our techniqueminimizesthesumof thesquaredintensityerrorsS -2T PVUXW ( �� (P �� (P �Y5 W �� P �� P ��Z 2 -[T P]\ 2P (6)

over all correspondingpairsof pixels ^ which areinsideboth imagesW �� and W (I��"(I��J(K� (pixels

which aremappedoutsideimageboundariesdo not contribute.) Oncewe have found the best

transformation1 2D, we canwarp image W ( into the referenceframeof W using 1`_ 12D [Wolberg,

1990]andthenblendthetwo imagestogether. To reducevisibleartifacts,weweightimagesbeing

blendedtogethermoreheavily towardsthecenter, usingabilinearweightingfunction.

To performtheminimization,we usetheLevenberg-Marquardtiterativenon-linearminimiza-

tionalgorithm[Pressetal., 1992]. Thisalgorithmrequiresthecomputationof thepartialderivatives

of \ P with respectto theunknown motionparametersa / 0 �b�b� / 7 c . Thesearestraightforwardto

compute,i.e., d \ Pd /0 - � Pe P d W (d � ( �f�b�g�E� d \ Pd /

7 - 5 � Pe P h � (P d W (d � ( Q � (P d W (d � (ji � (7)

4.1 Local imageregistration 7

wheree P is the denominatorin (5), and � d W (k� d �"(I� d W (l� d �"(m� is the imageintensitygradientof W (

at ��"(P ��"(P � . From thesepartials, the Levenberg-Marquardtalgorithm computesan approximate

Hessianmatrix n andtheweightedgradientvector o with componentsprqts -NT P d \ Pd /uq d \ Pd /=s � v q - 5 2 T Pw\ P d \ Pd /=q � (8)

andthenupdatesthemotionparameterestimatex by anamount∆ x - ��n Qzy|{ � _ 1 o , where yis a time-varyingstabilizationparameter[Presset al., 1992]. Theadvantageof usingLevenberg-

Marquardtover straightforwardgradientdescentis that it convergesin fewer iterations. The

completeregistrationalgorithmthusconsistsof thefollowing steps:

1. For eachpixel ^ at location �� P �� P � ,(a) computeits correspondingpositionin theotherimage ��}(P ��J(P � using(5);

(b) computetheerrorin intensitybetweenthecorrespondingpixels \ P - W ( �� (P �� (P �~5 W �� P �� P �andtheintensitygradient� d W (l� d �"(I� d W (m� d �J(m� ;

(c) computethepartialderivativeof \ P w.r.t. the /=q usingd \ Pd /=q - d W (d � ( d �"(d /uq Q d W (d � ( d �"(d /=qasin (7);

(d) addthepixel’s contributionto n and o asin (8).

2. Solve thesystemof equations��n Q�yR{ � ∆ x - o andupdatethemotionestimatex��l�I� 1� -x�� Q ∆ x .

3. Checkthat theerror in (6) hasdecreased;if not, incrementy (asdescribedin [Presset al.,

1992])andcomputeanew ∆ x ,

4. Continueiteratinguntil the error is below a thresholdor a fixed numberof stepshasbeen

completed.

Thestepsin thisalgorithmaresimilarto theoperationsperformedwhenwarpingimages[Wolberg,

1990;Beier andNeely, 1992], with additionaloperationsfor correctingthe currentwarpingpa-

rametersbasedonthelocal intensityerrorandits gradients(similarto theprocessin [Gleicherand

Witkin, 1992]). For moredetailson theexactimplementation,see[SzeliskiandCoughlan,1994].

8 4 Planarimagemosaicing

4.2 Global image registration

Unfortunately, bothgradientdescentandLevenberg-Marquardtonly find locally optimalsolutions.

If themotionbetweensuccessive framesis large,we mustusea differentstrategy to find thebest

registration. We have implementedtwo differenttechniquesfor dealingwith this problem. The

first technique,which is commonlyusedin computervision, is hierarchical matching, which first

registerssmaller, subsampledversionsof theimageswheretheapparentmotionis smaller[Quam,

1984;Witkin etal., 1987;Bergenetal., 1992]. Motion estimatesfrom thesesmallercoarserlevels

are thenusedto initialize motion estimatesat finer levels, therebyavoiding the local minimum

problem(see[SzeliskiandCoughlan,1994] for details.) While this techniqueis not guaranteed

to find the correctregistration,we have observed empirically that it workswell whenthe initial

misregistrationis only a few pixels (the exact domainof convergencedependson the intensity

patternin theimage.)

For largerdisplacements,we usephasecorrelation [Kuglin andHines,1975;Brown, 1992].

This techniqueestimatesthe 2-D translationbetweena pair of imagesby taking 2-D Fourier

Transformsof eachimage, computingthe phasedifferenceat eachfrequency, performing an

inverseFourierTransform,andsearchingfor a peakin themagnitudeimage. In our experiments,

this techniquehasprovento work remarkablywell, providing goodinitial guessesfor imagepairs

whichoverlapby aslittle as50%. Thetechniquewill notwork,however, if theinter-framemotion

is not mostly translational(largerotationsor zooms),but this doesnot occurin practicewith our

currentimagesequences.

4.3 Results

To demonstratetheperformanceof our algorithm,we digitizedanimagesequencewith a camera

panningoverawhiteboard.Figure2ashowsthefinal mosaicof thewhiteboard,with thelocations

of theconstituentimagesin themosaicshown asblackoutlines.Figure2b is a portionof thefinal

mosaicwithouttheseoutlines.Thismosaicis 1300 2046pixels,basedoncompositing39NTSC

(640 480)resolutionimages.

To computethismosaic,we developedaninteractive imagemosaicingtool which letstheuser

coarselypositionsuccessiveframesrelativetoeachother. Thetoolalsohasanautomaticmosaicing

optionwhichcomputestheinitial roughplacementof eachimagewith respectto thepreviousone

5 Panoramicimagemosaicing 9

usingphasecorrelation.Our algorithmthenrefinesthelocationof eachimageby minimizing (6)

usingthecurrentmosaicthusfar as W �� andtheinput framebeingadjustedas W (I��"(��J(m� . The

imagesin Figure2wereautomaticallycompositedwithoutuserinterventionusingthemiddleframe

(centerof theimage)astheanchor image(nodeformations.)As we cansee,our techniqueworks

well on this example.Section9 discussessomeimagingartifactswhichcancausedifficulties.

5 Panoramic image mosaicing

Another imagemosaicingproblemwhich turnsout to be equallysimple to solve is panoramic

imagemosaicing.In this scenario,we rotatea cameraaroundits opticalcenterin orderto create

a full “viewing sphere”of images(in computergraphics,this representationis sometimescalled

anenvironmentmap[Greene,1986].) This is similar to theactionof panoramicstill photographic

cameraswherethe rotationof a cameraon top of a tripod is mechanicallycoordinatedwith the

film transport,theanamorphiclensusedin [Lippman,1980],andto “cinemain theround”movie

theaters[GreeneandHeckbert,1986]. In ourcase,however, wecanmosaicmultiple2-D imagesof

arbitrarydetailandresolution,andweneednotknow thecameramotion. Examplesof applications

includeconstructingtruescenicpanoramas(sayof theview at the rim of theGrandCanyon), or

limited virtual environments(recreatinga meetingroomor officeasseenfrom onelocation.)

It turnsout that imagestakenfrom the sameviewpoint (stationaryoptical center)arerelated

by 2-D projective transformations,just asin the planarscenecase(seeAppendixA for a quick

proof.) Intuitively, we cannottell therelative depthof pointsin thesceneaswe rotate(thereis no

motionparallax), so they mayaswell be locatedon any plane(in projective geometry, we could

saythey lie ontheplaneat infinity, � - 0.) For someapplications,thispropertyis usuallyviewed

asa deficiency, i.e., we cannotrecover scenedepthfrom rotationalcameramotion. For image

mosaicing,this is actuallyanadvantagesincewe just wantto compositea largesceneandbeable

to “look around”from astaticviewpoint.

More formally, the2-D transformationdenotedby 1 2D is relatedto theviewing matrices Âand Â ( andtheinter-view rotation < by

1 2D - Â ( < Â _ 1 (9)

(seeAppendixA.) In the caseof a calibratedcamera(known centerof projection),this hasa

10 5 Panoramicimagemosaicing

(a)

(b)

Figure2: Whiteboardimagemosaicexample

(a) mosaicwith componentlocationsshown as black outlines,(b) portion of mosaicin greater

detail.

5.1 Results 11

particularlysimpleform,

1 2D - %&&&' � 00 � 01 F � 02

� 10 � 11 F � 12

� 20 ��F|( � 21 �GF|(`F � 22 �GFR(* +++, � (10)

where F and FR( arethescaledfocal lengthsin thetwo views,and � q�s aretheentriesin therotation

matrix < .5 In this case,we only have to recover five independentparameters(or threeif the Fvaluesareknown) insteadof theusualeight.

How do we representa panoramicscenecompositedusingour techniques?Oneapproachis

to divide the viewing sphereinto several large,potentiallyoverlappingregions,andto represent

eachregion with a planeontowhich we pastetheimages[Greene,1986]. Anotherapproachis to

computethe relative positionof eachframerelative to somebaseframe,andto thenre-compute

anarbitraryview on thefly from all visible pieces,givena particularview direction < andzoom

factor F . In ourcurrentimplementation,weadopttheformerstrategy sinceit is asimpleextension

of theplanarcase.

5.1 Results

Figure3 showsamosaicof abookshelfandcluttereddesk,whichis obviouslyanon-planarscene.

The imageswereobtainedby tilting andpanninga video cameramountedon a tripod, without

any specialstepstakento ensurethat the rotationwasaroundthe truecenterof projection. The

completesceneis registeredquitewell.

6 Scenes with arbitrary depth

Mosaicingflat scenesor panoramicscenesmaybeinterestingfor certaintele-realityapplications,

e.g.,transmittingwhiteboardor documentcontents,or viewing anoutdoorpanorama.However,

for mostapplications,wemustrecover thedepthassociatedwith thesceneto give theillusion of a

3-D environment,e.g.,throughnearbyview synthesis[ChenandWilliams, 1993]. Two possible

approachesareto modelthesceneaspiecewise-planaror to recoverdense3-D depthmaps.

5At largefocal lengthsandsmallrotations,M2D resemblesa puretranslationmatrix.

12 6 Sceneswith arbitrarydepth

(a)

(c)

Figure3: Panoramicimagemosaicexample(bookshelfandcluttereddesk)

(a) mosaicwith componentlocationsshown as black outlines,(b) portion of mosaicin greater

detail.

6.1 Formulation 13

The first approachis to assumethat the sceneis piecewise-planar, asis the casewith many

man-madeenvironmentssuchas building exteriors and office interiors. The imagemosaicing

techniquedevelopedin Section4 canthenbeappliedto eachof theplanarregionsin the image.

Thesegmentationof eachimageinto itsplanarcomponentscaneitherbedoneinteractively(e.g.,by

drawing thepolygonaloutlineof eachregionto beregistered),or automaticallyby associatingeach

pixel with oneof severalglobalmotionhypotheses[JepsonandBlack,1993;WangandAdelson,

1993]. Oncetheindependentplanarpieceshave beencomposited,we could,in principle,recover

the relative geometryof thevariousplanesandthecameramotion (seeAppendixA.) However,

ratherthanpursuingthis approachin this paper, we will pursethesecond,moregeneralsolution

which is to recover a full depthmap,i.e., to infer themissing componentassociatedwith each

pixel in agivenimagesequence.

Whenthecameramotionis known, theproblemof depthmaprecovery is calledstereorecon-

struction(or multi-framestereoif morethantwo views areused[Matthieset al., 1989;Okutomi

andKanade,1993].) Thisproblemhasbeenextensively studiedin photogrammetryandcomputer

vision [Moffitt andMikhail, 1980;BarnardandFischler, 1982;Horn,1986;DhondandAggarwal,

1989]. Whenthe cameramotion is unknown, we have the moredifficult structure from motion

problem[Horn, 1986;Wenget al., 1993;Azarbayejaniet al., 1993;SzeliskiandKang,1994]. In

this section,we presentour solutionto this latterproblembasedon recoveringprojectivedepth,

which is particularlysimpleandrobustandfits in well with thematerialalreadydevelopedin this

paper. Readersinterestedin alternativetechniquesshouldconsulttheabove references.

6.1 Formulation

To formulatetheprojective structurefrom motionrecovery problem,we notethatthecoordinates

correspondingto apixel 0 with projectivedepth� in someotherframecanbewrittenas0 ( - A ( ; H - Â ( < Â _ 1 0 Q � Â ( > -N1 2D 0 Q � ˆ> � (11)

whereA

,;

, < , and > aredefinedin (2–3), and 1 2D and ˆ> arethe computedplanarprojection

matrix andepipoledirection(see(25) in AppendixA.) To recover theparametersin 1 2D andˆ>for eachframealongwith thedepthvalues� (whicharethesamefor all frames),weusethesame

Levenberg-Marquardtalgorithmasbefore.

14 7 Full 3-D modelrecovery

In moredetail,wewrite theprojectionequationas� (P - /0 � PRQ / 1� PRQ 6 0 � PRQ / 2/6 � PRQ / 7� PRQ 6 2� PRQ 1

�� (P - /3 � PRQ / 4� PRQ 6 1 � PRQ / 5/6 � PRQ / 7� PRQ 6 2� PRQ 1

� (12)

Wecomputethepartialderivativesof �"(P and �J(P w.r.t. the /=q and 6 q (whichweconcatenateinto the

motionvector x ) asbeforein (7.) We similarly computethepartialsof � (P and � (P with respectto� P , i.e., d �"(Pd � P - e P 6 0 5E6 2e 2P � d �J(Pd � P - e P 6 0 5E6 2e 2P � (13)

wheree P is thedenominatorin (12).

To estimatethe unknown parameters,we alternateiterationsof the Levenberg-Marquardtal-

gorithmover themotionparametersa / 0 �b�b�b��6 2 c in x andthedepthparametersab� P c , usingthe

partialderivativesdefinedabove to computetheapproximateHessianmatricesn andtheweighted

error vectors o asin (8). In our currentimplementation,in orderto reducethe total numberof

parametersbeingestimated,we representthe depthmapusinga tensor-productspline,andonly

recover thedepthestimatesat thesplinecontrolvertices(thecompletedepthmapis availableby

interpolation)[SzeliskiandCoughlan,1994].

6.2 Results

Figure4 showsanexampleof usingourprojectivedepthrecoveryalgorithm.Theimagesequence

wastakenby moving thecameraupandover thesceneof atablewith stacksof papers(Figure4a).

Theresultingdepth-mapisshownin Figure4basintensity-codedrangevalues. Figures4c–dshow

theoriginal intensityimagetexturemappedontothesurface(anexampleof view extrapolation),

and Figures4c–d show a set of grid-lines overlayedon the recoveredsurface. The shapeis

recoveredreasonablywell in areaswherethereis sufficient texture (uniform intensityareasgive

no depthcues),andtheextrapolatedview looksreasonable.Figure5ashows resultsfrom another

sequence,this time from an outdoorsceneof sometrees. Again, the geometryof the sceneis

recoveredreasonablywell, asindicatedin thedepthmapin Figure5b.

7 Full 3-D model recovery

Recoveringdepthmapsmaybeadequatefor many tele-realityapplications,e.g.,scanninglibrary

bookshelvesor supermarketaisles,but for otherapplications,e.g.,whenwe needto seetheback

7 Full 3-D modelrecovery 15

(a) (b)

(c) (d)

(e) (f)

Figure4: Depthrecoveryexample:tablewith stacksof papers

(a) input image,(b) intensity-codeddepthmap(darkis fartherback),(c–d)texture-mappedsurface

seenfrom novel viewpoints,(e–f)griddedsurface.

16 7 Full 3-D modelrecovery

(a) (b)

Figure5: Depthrecovery example:outdoorscenewith trees

(a) input image,(b) intensity-codeddepthmap(darkis fartherback).

sideof an object, full 3-D modelsmay be required. This is the mostdifficult imagemosaicing

problem,sincenot only do we have to recover depth,but we also have to merge (registerand

composite)multiple depthmaps,andrepresentobjectsgiven no a priori knowledgeabouttheir

roughshapeor topology.

For simpletopologiesandshapes,deformablephysically-basedmodelscando a goodjob of

recovering the unknown geometry[Terzopouloset al., 1987;PentlandandSclaroff, 1991]. For

generaltopologies,theproblemis moredifficult. Sincemany currentshaperecovery techniques

produceincompleteor sparsegeometricdescriptions,ageneral3-Dsurfaceinterpolationtechnique

mayberequiredtogenerateasmoothcontinuoussurface[Hoppeetal.,1992;SzeliskiandTonnesen,

1992].

Thereexist many techniquesfor extracting3-D shapefrom multiple views. For example,we

canrecoveravolumetricdescriptionfrom thebinarysilhouettesof anobjectagainstits background

[Szeliski,1993],computelocal optic flow (pixel motion)estimatesandconvert theseinto sparse

3-D point estimates[Szeliski,1991],or tracktheoccludingcontoursof anobjectto generate3-D

spacecurves[SzeliskiandWeiss,1993]. A completesurvey of 3-D shapeextractiontechniquesis

beyondthescopeof this paper. Instead,we presenttheresultsof two of ourpreviouslydeveloped

algorithmsappliedto animagesequenceof acuprotatingona calibratedturntable(Figure6a).

7 Full 3-D modelrecovery 17

(a) (b)

175

(c) (d)

Figure6: 3-D modelrecovery example

(a) input image,(b) octreerecoveredfrom silhouettes(c) 2-D edges,(d) 3-D extremalcontours

18 8 Applications

Ourfirst techniqueconvertseachimageinto abinarysilhouetteby differencingtheimagewith

an emptybackgroundimage. Eachcubein the octreevolumetric 3-D model is thenprojected

(usingthe known cameraposition) into the silhouette,andcubesthat fall outsidethe silhouette

areculled. Cubeswhich fall partially into thesilhouettearemarkedfor latersubdivision,andthe

processis repeatedaftereachcompleterevolutionatsuccessively finerresolutions[Szeliski,1993].

Theresultingoctreeis shown in Figure6b.

Our secondtechniqueextractssilhouette(extremal)edgesfrom theimagesequence,anduses

amulti-framestereoalgorithmto reconstructtheir3-D position(internalalbedoedgesarealsoex-

tractedandreconstructed)[SzeliskiandWeiss,1993].Figure6cshowsoneof theedgeimagesused

by thealgorithm,andFigure6dshown thecollectionof 3-D curvesestimatedby thealgorithm.

Theseexamplesdemonstratesomeof our algorithmsfor reconstructingan isolatedobject

undergoingknown motion. Similar techniquescanbeusedto solve the moregeneral3-D scene

recovery problemwherethe cameramotion is unknown. Methodsfor determiningthe motion

includethe projective motionalgorithmpresentedin the previous section,aswell astechniques

describedin [Horn, 1986;Wenget al., 1993;Azarbayejaniet al., 1993]. Comparedto the2-D

anddepth-mapmodelsdevelopedin Sections4–6,therecoveryof 3-D modelsis moredifficult and

lessrobust,but holdsthepromiseof morerealisticandmoregeneralobjectandscenemodels.

8 Applications

Givenautomatedtechniquesfor building 2-D and3-D scenesfrom videosequences,whatcanwe

do with thesemodels?In this section,we describea numberof potentialapplications,including

whiteboardanddocumentscanning,3-D modelacquisitionfor inverseCAD, modelacquisitionfor

computeranimationandspecialeffects,supermarketshoppingat home,interactive walkthroughs

of historicalbuildings,andlive tele-reality(telepresence)applications.

8.1 Planar mosaicing applications

Themoststraightforwardapplicationis scanningwhiteboardsor blackboardsasanaid to video-

conferencingor asan easyway to captureideas. Scanningcanproduceimagesof muchgreater

resolutionthansinglewide-anglelensshots. This facility couldbecombinedwith avideoprojec-

tion systemto createavirtual whiteboard similarto theDigitalDeskproposedby Wellner[Wellner,

8.2 3-D modelbuilding applications 19

1993](see[Gold, 1993]for otherexamplesof ubiquitouscomputingandaugmentedreality.) The

techniquesdevelopedin this paperenableany videocameraattachedto a computerto beused.In

specializedsituations,e.g.,in classrooms,acomputer-controlledcameracouldbeusedto perform

thescanning,removing theneedfor automaticregistrationof theimages.

Anotherobviousapplicationof this technologyis for documentscanning.Hand-heldscanners

currentlyperformthis functionquitewell. However, sincethey arebasedon linear CCDs,they

aresubjectto “skewing” problemsin eachstrip of the scannedimagewhich mustbe corrected

manually. Using2-D videoimagesasinput removessomeof theseproblems.A final globalskew

may still be neededto makethe documentsquare,but this can be performedautomaticallyby

detectingdocumentedgesor internalhorizontalandverticaledges(e.g.,columnborders).

8.2 3-D model building applications

The 3-D modelbuilding technology, while moredifficult to automatereliably, haseven greater

potential. For example,we could constructa 3-D fax which would scan3-D objectsat oneend

usinga videocameraandeithermechanicalor user-controlledmotion,anda3-D graphicsdisplay

at theotherend[Carlbometal., 1992]. This fax couldbeusedto transmit3-D modelsto a remote

sitefor viewing anddesignrevision,or to input roughCAD modelsfor reverseengineeringor clay

mock-upapplications.Of course,this technologycouldalsobecombinedwith stereolithography

or numericallycontrolledmilling to makethis a truesolid 3-D fax [Bresenham,1993].

Rapid3-D modelbuilding is also critical in the computeranimationandspecialeffects in-

dustries. Currently, it cantakedaysor weeksto build by handeachcomputermodelusedin a

feature-lengthfilm. Theability to build suchmodelsrapidlyfrom realobjectsorphysicalmock-ups

wouldbeof greatutility, especiallywhenfreedfrom therangeandresolutionlimitationsof active

rangefindingsystems.Thetechniquesweusefor computing3-D geometrycouldalsobemodified

to automaticallytrackthemotionsof objectsor actorsin ascene(this is calledmotioncapture, and

is currentlydoneusingspeciallyplacedmarkers.)

8.3 End-user applications

A moremass-marketapplicationishomeshoppingappliednottosingleobjects(eitherdemonstrated

by a modelor availablein a catalog),but to completestores,suchasyour local supermarket.This

20 8 Applications

hastheadvantageof having a familiar look andorganization,andenablesthepre-planningof your

next shoppingtrip. Theimagesof theaisles(with theircurrentcontents)canbedigitizedby rolling

a video camerathroughthe store. More detailedgeometricmodelsof individual items canbe

acquiredeitherby a 3-D modelbuilding process,or, in thefuture,directly from themanufacturer.

Theshoppercanthenstroll down theaisles,pick out individualitems,andlook at their ingredients

andprices.

A similarscenariois possiblewith otherbuildings. Architecturalwalkthroughshave longbeen

a part of computergraphics[Greenberg, 1974], but only for buildings currently underdesign.

Walkthroughof existing building canhaveanumberof applications.For example,interactive3-D

walkthroughsof your home,built by walking a video camerathroughthe roomsandmosaicing

the imagesequences,could be usedfor selling your house(an extensionof existing still-image

basedsystems),or for re-modelingor renovating its design. Walkthroughsof historic building

(e.g.,palacesor museums)canbe usedfor educationalandentertainmentpurposes.A museum

scenariomight includetheability to look at individual3-D objectssuchassculptures,andto bring

uprelatedinformationin a hypertext system[Miller etal., 1991].

Walkthroughs(or fly-throughs)canalsobedonein general(e.g.,outdoor)3-D scenes.An early

exampleof thiswastheMovie-Mapsproject[Lippman,1980]whichwasbasedonvideodisctech-

nology. Sincethis systemwasbasedon choosingfrom a numberof pre-storedmotionsequences,

the rangeof possibleviewpoints(anddetail) was limited. The basicMovie-Mapsideacan be

extendedby building true 3-D scenesfrom the input motion sequences. Such3-D tele-reality

modelshavethepotentialof beingof extremelyhighcomplexity andwill requiresolvingproblems

in representation,partial3-D models[ChenandWilliams, 1993],andswitchingbetweendifferent

levelsof resolution[FunkhouserandSequin,1993].

The ultimate in tele-realitysystemsis dynamictele-reality(sometimescalled telepresence),

which compositesvideo from multiple sourcein real-timeto createthe illusion of being in a

dynamic(andperhapsreactive) 3-D environment. An exampleof suchan applicationmight be

to view a 3-D versionof a concertwith controlover thecamerashots,evenbeingableto seethe

concertfrom themusicians’point of view. Otherexamplesmightbeto participateor consultin a

surgery from a remotelocation(tele-medicine), or to remotelyparticipatein a virtual classroom.

Building suchdynamic3-D modelsat frameratesis beyondtheprocessingpowerof today’shigh-

performancesuperscalarworkstations,but it couldbeachievedusingacollectionof suchmachines

9 Discussion 21

(or special-purposestereohardware).In suchapplications,weneedto usemultiplecameras,since

in moving sceneswe cannotrely on themotionparallaxfrom a singlecamerato give usreliable

shapeinformation.

9 Discussion

Imagemosaicingprovidesa powerful new way of creatingthe detailed3-D modelsandscenes

neededfor tele-realityapplications.By registeringmultiple imagestogether, wecancreatescenes

of extremelyhighresolutionandat thesametimerecoverpartialor full 3-D geometricinformation.

While we usetechniquesfrom computervision to performtheregistration,our focusis different:

traditionalvision techniquesaredesignedfor inspection,recognition,androbotcontrol,while our

techniquesaredesignedto producerealistic3-D modelsfor computergraphicsandvirtual reality

applications.

The approachwe use,namelydirect minimizationof intensitydifferencesbetweenwarped

images,hasa numberof advantagesover more traditional vision techniques,which are based

on trackingfeaturesfrom frameto frame[TomasiandKanade,1992;Azarbayejaniet al., 1993;

Szeliski and Kang, 1994]. Our techniquesproducedenseestimatesof shape,work in highly

texturedareaswherefeaturesmaynot bereliably observed,andmakestatisticallyoptimaluseof

all theinformation[SzeliskiandCoughlan,1994]. Ourapproachissimilarto [Bergenetal., 1992],

who alsouseintensitydifferences.However, we usefull 2-D projective modelsof motioninstead

of instantaneousquadraticflow fields,andwe alsousea projective formulationof structurefrom

motion,which eliminatestheneedfor calibratedcameras.It is alsovery similar to [Mann,1993],

whoalsouseplanarprojectivemotionestimates.Furthermore,[MannandPicard,1994]have also

demonstratedsuperresolutionresults,i.e., theability to obtainhigherresolutionimagesby simply

jittering thecameraratherthanpanning[Irani andPeleg, 1991].

While our techniqueshave workedwell in thescenesin which we have tried them,we must

becautiousabouttheir generalapplicability. Theintensity-basedtechniqueswe usearesensitive

to imageintensityvariation,suchas thosecausedby video cameragain control andvignetting

(darkeningof the cornersat wide lens apertures);working with band-passfiltered imagescan

remove mostof theseproblems.All vision techniquesarealsosensitive to geometricdistortions

(deviationsfrom thepinholemodel)in the optics,so carefulcalibrationis necessaryfor optimal

22 10 Conclusions

accuracy (theresultsin this paperwereobtainedwith uncalibratedcameras.)

Thedepthextractiontechniquesweuserely onthepresenceof texturein theimage.In areasof

sufficienttexture,it is still possiblefor theregistration/matchingalgorithmtocomputeanerroneous

depthestimate(suchgrosserrorsarelesscommonin activerangefinders.)Wheretextureis absent,

interpolationmustbe used,andthis canleadto erroneous(hallucinated)depthestimates.Other

visual cues,suchasoccludingcontoursor shading[Horn, 1986], canbe usedto mitigate these

problemsin somecases,but a general-purposereliablevision-basedrangingsystemremainsvery

muchanopenresearchproblem.

The size and structureof the modelsproducedfor tele-realityapplicationshave interesting

implicationsfor specializedgraphicshardwareandsoftwareinterfaces.Currentgraphicshardware

storestheintensityimages(texturemaps)separatelyfromthegeometry, whichisstoredaspolygons

usuallyspanningmany pixels. A moreappropriatearchitecturemightberegisteredmulti-resolution

color imagesanddepthmaps.View interpolation,ratherthan3-D rendering,couldthenbeused

to synthesizenovel 3-D views [Chen and Williams, 1993]. Thesekinds of architecturesand

algorithmspoint towardsa tighter coupling betweencomputergraphicsand imageprocessing

[Pickering,1993].

10 Conclusions

In thispaper,wehavepresentedahierarchyof scenemodels,rangingfrom2-Dplanarandpanoramic

mosaicsthroughfull 3-D objectmodels,anddevelopeda collectionof associatedscenerecovery

algorithms.Thecreationof realistichigh-resolution3-Dscenesfromvideoimageryopensupmany

new applicationsfor tele-realitytechnology. Theseincludeofficeapplicationssuchaswhiteboard,

document,andbookshelfscanning,simulatedmeetingspaces,engineeringapplicationssuchas

reverseengineeringand collaborative design,and computergraphicsapplicationssuchas 3-D

modelbuilding. Moreimportantly, this technologyenablesmassmarketapplicationssuchashome

shopping,education(the virtual classroom),and entertainment(virtual travel). Ultimately, as

processingspeedsandreconstructionalgorithmsimprove further, we will seedynamicreal-time

3-D sceneandmodelrecovery beingusedto provide anevenmoreexciting rangeof telepresence

andtele-realityapplications.

10 Conclusions 23

Acknowledgements

Bob Thomasintroducedme to the term tele-reality, which he usedto describethe live concert

telepresencescenario.JamesCoughlandevelopedthe imageregistrationcode. David Tonnesen,

DemetriTerzopoulos,SingBing Kang,andRichardWeisswerecollaboratorsonsomeof the3-D

reconstructionalgorithmsdescribedin this paper. William Hsu helpedmeproducethe resultsin

thispaper. Ingrid CarlbomandGudrunKlinker providedusefulcommentson themanuscript.

References

[CyberwareLaboratoryInc, 1990] CyberwareLaboratoryInc. 4020/RGB3D Scannerwith color

digitizer. Monterey, 1990.

[Adam,1993] J.A. Adam. Virtual reality is for real. Spectrum, :22–29,October1993.

[Azarbayejanietal., 1993] A. Azarbayejani,B. Horowitz, andA. Pentland.Recursiveestimation

of structureandmotion using relative orientationconstraints. In IEEE ComputerSociety

Conferenceon ComputerVision andPattern Recognition(CVPR’93), pages294–299,New

York, New York, June1993.

[BallardandBrown, 1982] D. H. Ballard and C. M. Brown. ComputerVision. Prentice-Hall,

EnglewoodClif fs, New Jersey, 1982.

[BarnardandFischler, 1982] S.T. BarnardandM. A. Fischler. Computationalstereo.Computing

Surveys, 14(4):553–572,December1982.

[BeierandNeely, 1992] T. BeierandS. Neely. Feature-basedimagemetamorphosis.Computer

Graphics(SIGGRAPH’92), 26(2):35–42,July1992.

[Bergenetal., 1992] J.R.Bergen,P. Anandan,K. J.Hanna,andR.Hingorani.Hierarchicalmodel-

basedmotionestimation.In SecondEuropeanConferenceon ComputerVision (ECCV’92),

pages237–252,Springer-Verlag,SantaMargheritaLiguere,Italy, May 1992.

[Beymeretal., 1993] D. Beymer, A. Shashua,andT. Poggio.ExampleBasedImageAnalysisand

Synthesis. A. I. Memo1431,MassachusettsInstituteof Technology, November1993.

[BlakeandIsard,1994] A. BlakeandM. Isard.3d position,attitude,andshapeinput usingvideo

trackingof handsandlips. ComputerGraphics(SIGGRAPH’94), July 1994.

24 10 Conclusions

[Bresenham,1993] J. Bresenham.Realvirtuality: Stereolithography– rapidprototypingin 3-D.

ComputerGraphics(SIGGRAPH’93), :377–378,August1993.

[Brown, 1992] L. G. Brown. A survey of imageregistrationtechniques.ComputingSurveys,

24(4):325–376,December1992.

[Carlbomet al., 1992] I. Carlbomandothers.Modelingandanalysisof empiricaldatain collab-

orativeenvironments.Communicationsof theACM, 35(6):74–84,April 1992.

[Carlbomet al., 1991] I. Carlbom,D. Terzopoulos,andK. M. Harris. Reconstructingandvisual-

izing modelsof neuronaldendrites.In N. M. Patrikalakis,editor, ScientificVisualizationof

PhysicalPhenomena, pages623–638,Springer-Verlag,New York, 1991.

[ChenandWilliams, 1993] S. Chenand L. Williams. View interpolationfor imagesynthesis.


[DhondandAggarwal,1989] U. R. Dhond and J. K. Aggarwal. Structurefrom stereo—are-

view. IEEE Transactionson Systems,Man, and Cybernetics, 19(6):1489–1510,Novem-

ber/December1989.

[Earnshaw etal., 1993] R. A. Earnshaw, M. A. Gigante,andH. Jones,editors. Virtual Reality

Systems. AcademicPress,London,1993.

[Faugeras,1992] O. D. Faugeras.What can be seenin threedimensionswith an uncalibrated

stereorig? In SecondEuropeanConferenceonComputerVision(ECCV’92), pages563–578,

Springer-Verlag,SantaMargheritaLiguere,Italy, May 1992.

[Foley etal., 1990] J.D. Foley, A. vanDam,S.K. Feiner, andJ.F. Hughes.ComputerGraphics:

PrinciplesandPractice. Addison-Wesley, Reading,MA, 2 edition,1990.

[FunkhouserandSequin,1993] T. A. FunkhouserandC. H. Sequin. Adaptive displayalgorithm

for interactive frameratesduringvisualizationof complex virtual environments.Computer

Graphics(SIGGRAPH’93), :247–254,August1993.

[GleicherandWitkin, 1992] M. GleicherandA. Witkin. Through-the-lenscameracontrol. Com-

puterGraphics(SIGGRAPH’92), 26(2):331–340,July 1992.

[Gold, 1993] R. Gold. Ubiquitouscomputingandaugmentedreality. ComputerGraphics(SIG-

GRAPH’93), :393–394,August1993.

[Greenberg, 1974] D. P. Greenberg. Computergraphicsin architecture. ScientificAmerican,

:98–106,May 1974.

10 Conclusions 25

[Greene,1986] N. Greene. Environmentmappingandother applicationsof world projections.

IEEE ComputerGraphicsandApplications, 6(11):21–29,November1986.

[GreeneandHeckbert,1986] N. GreeneandP. Heckbert.CreatingrasterOmnimaximagesfrom

multipleperspectiveviewsusingtheellipticalweightedaveragefilter. IEEEComputerGraph-

icsandApplications, 6(6):21–27,June1986.

[Hartley, 1994] R. I. Hartley. Euclideanreconstructionfrom uncalibratedviews. In IEEE Com-

puterSocietyConferenceon ComputerVision andPatternRecognition(CVPR’94), Seattle,

Washington,June1994.

[Hoppeetal., 1992] W. Hoppeandothers.Surfacereconstructionfrom unorganizedpoints.Com-

puterGraphics(SIGGRAPH’92), 26(2):71–78,July 1992.

[Horn, 1986] B. K. P. Horn. RobotVision. MIT Press,Cambridge,Massachusetts,1986.

[Irani andPeleg, 1991] M. IraniandS.Peleg. Improvingresolutionby imageregistration.Graph-

ical ModelsandImageProcessing, (3), May 1991.

[JepsonandBlack,1993] A. JepsonandM. J. Black. Mixture modelsfor optical flow computa-

tion. In IEEE ComputerSocietyConferenceon ComputerVision and Pattern Recognition

(CVPR’93), pages760–761,New York, New York, June1993.

[Kuglin andHines,1975] C. D. Kuglin andD. C. Hines. Thephasecorrelationimagealignment

method.In IEEE 1975Conferenceon CyberneticsandSociety, pages163–165,New York,

September1975.

[Lippman,1980] A. Lippman. Movie maps:An applicationof theopticalvideodiscto computer

graphics.ComputerGraphics(SIGGRAPH’80), 14(3):32–43,July1980.

[Mann,1993] S. Mann. Compositingmultiple picturesof the samescene: Generalizedlarge-

displacement8-parametermotion. In Proceedingsof the46thAnnualIS&T Conference, The

Societyof ImagingScienceandTechnology, Cambridge,Massachusetts,May 1993.

[MannandPicard,1994] S. Mann and R. W. Picard. The projective group model for multi-

frameresolutionenhancement.In First IEEEInternationalConferenceonImageProcessing

(ICIP’94), (submitted)1994.

[Matthieset al., 1989] L. H. Matthies,R. Szeliski,andT. Kanade.Kalmanfilter-basedalgorithms

for estimatingdepth from image sequences.International Journal of ComputerVision,

3:209–236,1989.

26 10 Conclusions

[Miller etal., 1991] G. Miller and others. The virtual museum: Interactive 3D navigation of

a multimediadatabase.Journal of Visualizationand ComputerAnimation, 3(3):183–198,

1991.

[Moffitt andMikhail, 1980] F. H. Moffitt andE. M. Mikhail. Photogrammetry. Harper& Row,

New York,3 edition,1980.

[OkutomiandKanade,1993] M. Okutomi and T. Kanade. A multiple baselinestereo. IEEE

TransactionsonPatternAnalysisandMachineIntelligence, 15(4):353–363,April 1993.

[PentlandandSclaroff, 1991] A. PentlandandS.Sclaroff. Closed-formsolutionsfor physically-

basedshapemodelingandrecognition.IEEETransactionsonPatternAnalysisandMachine

Intelligence, 13(7):715–729,July1991.

[Pickering,1993] W. R. Pickering.Merging 3-D graphicsandimaging– applicationsandissues.


[PorterandDuff, 1984] T. PorterandT. Duff. Compositingdigital images.ComputerGraphics

(SIGGRAPH’84), 18(3):253–259,July1984.

[Presset al., 1992] W. H. Press,B. P. Flannery, S.A. Teukolsky, andW. T. Vetterling.Numerical

Recipesin C: The Art of ScientificComputing. CambridgeUniversity Press,Cambridge,

England,secondedition,1992.

[Quam,1984] L. H. Quam. Hierarchical warp stereo. In Image UnderstandingWorkshop,

pages149–155,ScienceApplicationsInternationalCorporation,New Orleans,Louisiana,

December1984.

[RehgandKanade,1994] J.RehgandT.Kanade.Visualtrackingof highdofarticulatedstructures:

anapplicationto humanhandtracking. In Third EuropeanConferenceon ComputerVision

(ECCV’94), pages35–46,Springer-Verlag,Stockholm,Sweden,May 1994.

[Rioux andBird, 1993] M. RiouxandT. Bird. Whitelaser, syncedscan.IEEEComputerGraphics

andApplications, 13(3):15–17,May 1993.

[SempleandKneebone,1952] J.G. SempleandG. T. Kneebone.Algebraic ProjectiveGeometry.

ClarendonPress,Oxford,England,1952.

[Szeliski,1991] R.Szeliski.Shapefrom rotation.In IEEEComputerSocietyConferenceonCom-

puter Vision andPattern Recognition(CVPR’91), pages625–630,IEEE ComputerSociety

Press,Maui, Hawaii, June1991.

10 Conclusions 27

[Szeliski,1993] R. Szeliski. Rapidoctreeconstructionfrom imagesequences.CVGIP: Image

Understanding, 58(1):23–32,July1993.

[Szeliski,1994] R.Szeliski.A leastsquaresapproachto affineandprojectivestructureandmotion

recovery. 1994.In preparation.

[SzeliskiandCoughlan,1994] R. SzeliskiandJ.Coughlan.Hierarchicalspline-basedimagereg-

istration.In IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition

(CVPR’94), Seattle,Washington,June1994.

[SzeliskiandKang,1994] R. SzeliskiandS. B. Kang. Recovering 3D shapeandmotion from

imagestreamsusingnonlinearleastsquares.Journal of Visual Communicationand Image

Representation, 5(1):10–28,March1994.

[SzeliskiandTonnesen,1992] R. SzeliskiandD. Tonnesen.Surfacemodelingwith orientedpar-

ticle systems.ComputerGraphics(SIGGRAPH’92), 26(2):185–194,July1992.

[SzeliskiandWeiss,1993] R. SzeliskiandR. Weiss.Robustshaperecovery from occludingcon-

toursusinga linearsmoother. In ImageUnderstandingWorkshop, pages939–948,Morgan

KaufmannPublishers,Washington,D. C., April 1993. A longerversionis availableasCRL

TR 93/7.

[TerzopoulosandWitkin, 1988] D. Terzopoulosand A. Witkin. Physically-basedmodelswith

rigid anddeformablecomponents.IEEE ComputerGraphicsandApplications, 8(6):41–51,

1988.

[Terzopoulosetal., 1987] D. Terzopoulos,A. Witkin, andM. Kass. Symmetry-seekingmodels

and 3D object reconstruction. International Journal of ComputerVision, 1(3):211–221,

October1987.

[TomasiandKanade,1992] C. TomasiandT. Kanade. Shapeandmotion from imagestreams

under orthography: A factorizationmethod. International Journal of ComputerVision,

9(2):137–154,November1992.

[WangandAdelson,1993] J. Y. A. WangandE. H. Adelson.Layeredrepresentationfor motion

analysis.In IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition

(CVPR’93), pages361–366,New York, New York, June1993.

[Wellner, 1993] P. Wellner. Interactingwith paperon the DigitalDesk. Communicationsof the

ACM, 36(7):86–96,July 1993.

28 A 2-D projectivetransformations

[Wengetal., 1993] J. Weng, T. S. Huang, and N. Ahuja. Motion and Structure from Image

Sequences. Springer-Verlag,Berlin, 1993.

[Witkin et al., 1987] A. Witkin, D. Terzopoulos,and M. Kass. Signal matchingthroughscale

space.InternationalJournalof ComputerVision, 1:133–144,1987.

[Wolberg, 1990] G. Wolberg. Digital ImageWarping. IEEEComputerSocietyPress,LosAlami-

tos,California,1990.

A 2-D projective transformations

Planarsceneviewsarerelatedby projectivetransformations.This is truein themostgeneralcase,

i.e., for any 3-D to 2-D mapping, 0 -21 camH3� (14)

whereH is a3-D worldcoordinate,0 is the2-D screenlocation,and 1 cam is a3 4 camera matrix

definedin (4). Since1 cam is of rank3, we haveH -z1�� 0 Q�� x (15)

where 1 � is a left inverseof 1 cam and x is in thenull spaceof 1 cam, i.e., 1 camx - 0. The

equationof a planein world coordinatescanbewrittenasp � Q v�� Q]� -N� or ��GH - 0 (16)

from whichwe canconcludethat� @ 1 � 0 Q]� �u�Gx - 0 or � - 5�� @ 1 � 0��t��Gx�� (17)

andhence H - � { 5[��u�Gx�� _ 1 xE� @ � 1 � 0 - ˜1 0O� (18)

Fromany otherviewpoint,wehave 0 ( -N1 (cam˜1 0 -N1 2D 0O� (19)

which is a2-D planarprojective transformation.

A 2-D projectivetransformations 29

For theremainderof thisappendix,it is convenientto assumethattheworld coordinatesystem

coincideswith thefirst cameraposition,i.e.,; - { and 1 cam - A - B Â ? D , andtherefore1 � - Â _ 1 and x - � 0 � 0 � 0 � 1 � @ in (15–18). An imagepoint 0 thencorrespondsto a world

coordinateH , H - %' Â _ 1 0� *, � (20)

where � is theunknown fourth coordinateof H (calledprojectivedepth) which controlshow far

thepoint is from theorigin.

For panoramicimagemosaicing,we rotatethe point H aroundthe cameraoptical centerto

obtain H ( - %'.< ??"@1

*, H - %'�< Â _ 1 0� *, � (21)

with a correspondingscreencoordinate0 ( - Â ( < Â _ 1 0 -N1 2D 03� (22)

Thus,themappingbetweenthetwo screencoordinatesystemscanbedescribedby a2-D projective

transformation.

For piecewise-planarscenes,thepoint H will lie onsomeplane� whoseequationwecanwrite

as � q �GH - � ˆ� q �b5 1 ��GH - 0.6 Then,(18) reducestoH q - %' {� @q *, Â _ 1 0 (23)

asthe3-D positiononplane� correspondingto thescreenposition 0 . In asecondcameraposition,

theworld coordinateshave undergonea EuclideantransformationH (q - ; H q with rotation < and

translation> . Thenew screencoordinatesaretherefore0 (q - A ( ; H q - Â ( � < Q > � @q � Â _ 1 0 -N1 q 0 (24)

Thus,weseethat2-D projectivemotionmatricesfor thevariousplanarpiecesareinterrelated,and

arein fact determinedby the planeequations( � q ), the rigid motion ( < and > ), andthe viewing

matrices( Â and Â ( ). For methodsfor recovering theseunknown parametersfrom the 1 q , see

[Faugeras,1992;Hartley, 1994].

6Weexcludeplanespassingthroughthefirst cameracenterin this formulation.

30 A 2-D projectivetransformations

For generaldepthrecovery, eachimagepoint 0 correspondsto a3-D point H with anunknown

projective depth � asgiven in (20). In someotherframe,whoserelative orientationto the first

frameis denotedby;

, we have0 ( - A ( ; H - Â ( < Â _ 1 0 Q � Â ( > -N1 2D 0 Q � ˆ> � (25)

Thus,themotionbetweenthe two framescanbedescribedby our familiar 2-D planarprojective

motion, plus an amountof motion proportionalto the projective depth along the direction ˆ>(which is calledthe epipole) [Faugeras,1992]. Note that the valuesof 1 2D, ˆ> , and � arenot

unique: thereis a scaleambiguitybetween� and ˆ> , andmultiplesof ˆ> can be addedto 1 2D

with theappropriateadjustmentsto � . To remove theseambiguities,we mustenforcetherigidity

constraints1 2D - Â ( < Â _ 1 [Szeliski,1994].

Image Mosaicing for Tele-Reality Applicationswhitaker/classes/image_proc/crl-94-2.pdfreality in this...

Documents

Transcript of Image Mosaicing for Tele-Reality Applicationswhitaker/classes/image_proc/crl-94-2.pdfreality in this...