Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of...

8
Interactive 3D Scene Reconstruction from Images Aaron Hertzmann Media Research Laboratory Department of Computer Science New York University 719 Broadway, 12th Floor New York, NY 10003 [email protected] http://www.mrl.nyu.edu/hertzman NYU Computer Science Technical Report April 22, 1999 Abstract We propose an interactive framework for reconstructing an arbitrary 3D scene consistent with a set of im- ages, for use in example-based image synthesis. Previous research has used human input to specify feature matches, which are then processed off-line; however, it is very difficult to correctly match images without feedback. The central idea of this paper is to perform and display 3D reconstruction during user modifi- cation. By allowing the user to interactively manipulate the image correspondence and the resulting 3D reconstruction, we can exploit both the user’s intuitive image understanding and the computer’s processing power. 1 Introduction and Related Work The traditional goal of computer vision research is to computationally understand 3D scenes from sensor data. This research primarily targets real-time applications (such as robot guidance) and understanding the human visual system; hence, there is ideally no human intervention in the image analysis process. Although many impressive systems have been demonstrated, it is difficult to speculate if and when we will see effective algorithms for reconstructing arbitrary scenes from images. A more recent research goal is the creation of 3D models for visualization and computer animation. The requirements for this problem differ from the requirements of real-time robot vision. Human assistance in the 3D reconstruction is often acceptable, because modeling is done in advance of rendering and viewing. However, artifacts and reconstruction errors are unacceptable, because the resulting models will ultimately be viewed by humans, in situations such as in movies or video games. Many of the existing example-based image synthesis systems have used human input at some point in the correspondence/reconstruction process [3, 6, 7, 8, 9, 12, 14], particularly those that extrapolate views 1

Transcript of Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of...

Page 1: Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of difference between the left source image rendered onto the mesh and the right image rendered

Interactive3D SceneReconstructionfrom Images

AaronHertzmannMediaResearchLaboratory

Departmentof ComputerScienceNew York University

719Broadway, 12thFloorNew York, NY 10003

[email protected]://www.mrl.nyu.edu/hertzman

NYU ComputerScienceTechnicalReport

April 22,1999

Abstract

We proposeaninteractive framework for reconstructinganarbitrary3D sceneconsistentwith a setof im-ages,for usein example-basedimagesynthesis.Previousresearchhasusedhumaninput to specifyfeaturematches,which arethenprocessedoff-line; however, it is very difficult to correctlymatchimageswithoutfeedback.The centralideaof this paperis to performanddisplay3D reconstructionduring usermodifi-cation. By allowing the userto interactively manipulatethe imagecorrespondenceandthe resulting3Dreconstruction,we canexploit boththeuser’s intuitive imageunderstandingandthecomputer’s processingpower.

1 Introduction and Related Work

The traditionalgoal of computervision researchis to computationallyunderstand3D scenesfrom sensordata.This researchprimarily targetsreal-timeapplications(suchasrobotguidance)andunderstandingthehumanvisualsystem;hence,thereis ideallynohumaninterventionin theimageanalysisprocess.Althoughmany impressivesystemshavebeendemonstrated,it is difficult to speculateif andwhenwewill seeeffectivealgorithmsfor reconstructingarbitrary scenesfrom images.

A morerecentresearchgoalis thecreationof 3D modelsfor visualizationandcomputeranimation.Therequirementsfor this problemdiffer from therequirementsof real-timerobotvision. Humanassistanceinthe3D reconstructionis oftenacceptable,becausemodelingis donein advanceof renderingandviewing.However, artifactsandreconstructionerrorsareunacceptable,becausetheresultingmodelswill ultimatelybeviewedby humans,in situationssuchasin moviesor videogames.

Many of theexisting example-basedimagesynthesissystemshave usedhumaninput at somepoint inthe correspondence/reconstruction process[3, 6, 7, 8, 9, 12, 14], particularlythosethat extrapolateviews

1

Page 2: Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of difference between the left source image rendered onto the mesh and the right image rendered

in addition to interpolating. User interaction,as reportedin the literature,hasbeenlimited to markingfeaturecorrespondences,from which a 3D model is thenautomaticallyextracted. (Many of the systemscitedrepresentthemodelimplicitly, with structuressuchasdensepointcorrespondencesandafundamentalmatrix. Up to a projective transform,the implicit representationsare equivalent to reconstructinga 3Dmodel that is consistent with the exampleimages.) Work that hasfocusedon specializeddomains,suchasarchitecture[3, 12] andhumanfaces[7], hasbeenparticularlysuccessful.However, it is exceptionallydifficult to correctlymarkimagecorrespondencesfor arbitrarysurfaces,or evento know which featurestomark.Imagecorrespondencesthatappearreasonableoftenresultin grosslyincorrect3D models(Figure2),requiringtheuserto repeatthetediousprocessof matchingseveraltimesuntil anacceptablemodelhasbeencreated.

Our approachis to tightly coupleuserinteractionwith 3D reconstruction.Theideais to alwaysdisplaythe 3D modelduring interaction. As the userselectsandmodifiespoint and line matchesin the sourceimages,theresulting3D modelis continuallyupdatedin anotherwindow. Theusercancorrecta matchtosubpixel accuracy by dragginga correspondingpoint in the imageplane,or by directly modifying the3Dgeometry. This methodgivestheuserimmediatefeedbackasto thequality of thecorrespondenceandthereconstruction.

We believe that including the humanin the reconstructionloop will producea quantumleap in ourability to reconstructthe geometryof arbitraryscenes.This will be achieved by properlycombiningtheuser’s intuitive imageunderstandingandqualitative reasoningwith thecomputer’s processingpower.

2 Camera and Scene Model

We first describethegeneralmathematicalstructureof our system.Theinput is a setof images(e.g. pho-tographs)of ascene,alongwith cameracalibrationinformation.Eachcameracalibrationconsistsof:

� A focalcenterc � .� A projectionfunctionP ��� � ����� ���

thatmapsworld pointsto imagepoints.

� An inverseprojectionfunctionQ � � � ���� � ���thatmapsimagepointsto unit-lengthworld vectors.

Thesequantitiesarerelatedby Q ��� P � � x ����� x � c ����x � c � ��� , wherex is any world spacepointvisible to camera�. In oursystem,wehaveassumedapairof pinholeprojectioncamerasin theparallelconfiguration1 (Figure

1).A scenereconstructionis representedasascatteredsetof world pointsx� . Thesepointsspecifyasetof

sparsecorrespondences:P � � x��� in image0 matchesP � � x��� in image1. We denotetheimagespacepointsas � �� � P � � x��� . A line correspondenceis representedasa pair � �! #" � that indicatesanedgebetweenx � andx� .

In orderto renderthesurfacein our system,we triangulatetheworld pointsusingthe topologyof theconstrainedDelaunaytriangulation[10] of the points � �� , with line correspondencesusedastriangulationconstraints.The trianglesarerenderedin 3D, andtexture-mappedwith a sourceimage,usingthe imagecoordinatesastexturecoordinates[2, 4]. Theseoperationsarefastenoughto performin real-time.

1Thisconfigurationis alsoknown asthestereoconfiguration.

2

Page 3: Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of difference between the left source image rendered onto the mesh and the right image rendered

c$ 0

c$ 1

x% i

P0(xi)

P&

1(xi)

Figure1: Theparallelcameraconfiguration.

For the goal of example-basedimagesynthesis,it is sufficient to createa 3D modelthat is consistentwith thesourceimages.Wecouldalsousearepresentationsuchaspointcorrespondencesandafundamentalmatrix that would make the calibrationambiguitiesexplicit. We find the 3D formulationto besimplertoimplementandunderstand,andeasierto generalize.Thisuseof 3D is inspiredby [5].

3 The Interactive System

The3D reconstructionbeginswith aninitial pointmatching.Webegin with anemptymatching(noverteciesor triangles);anautomatically-generated disparitymapcouldalsobeused.

Given the initial model, the usermay addpoint matchesandmodify point matches.To adda pointmatch, the usersimply clicks the point’s initial location � � � in one of the images. The matchingpointin the other imageis computedusing a default disparity ' : � �� �(� ��*),+ ' �-�.0/ or � �� �(� ���12+ ' �-�.0/ .x � is computedas the intersectionof the rays from the cameracentersto the imagepoints (i.e. x �3�ray � c � Q �4� � �� ����5 ray � c � Q ��� � �� ��� , whereray �76 98 � denotesa ray from point 6 in direction

8). Thenew

meshis thenretriangulatedandrendered.The majority of the usertime with our methodis spentmodifying point matches.Point matchesare

displayedoneachimageasgreencrosses;whenthemouseis movedover a match,thecorrespondingpointin theotherimageis highlighted.Theuserfirst identifiesapointmatchthatdoesnotappearcorrect,usuallyby lookingfor errorsin the3D meshor by visuallycomparingtheimagesunderthematch.Onceamatchisselected,theusersimply dragsthematchin oneof theimagesto a new location.While thematchis beingmodified,the 3D modelis continuallyupdatedandrendered,allowing the userto interactively refinethemeshshape.This “human-in-the-loop”reconstructionideais simple,but veryeffective.

Thenew world pointlocationx � for thematchis computedby intersectingtheraythroughthenew matchpoint :��;� ray � c � Q �<� � �� ��� andtheray throughthecorrespondingpoint :4�=� ray � c � Q ��� � �� ��� . If theraysdonot intersect,thenthepointon :4� is chosenthatis nearestto :�� , andnew valuesfor theimagepointsarecomputedby projection( � �� � P � � x��� ). This is equivalentto constrainingthematchto lie on theepipolar

3

Page 4: Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of difference between the left source image rendered onto the mesh and the right image rendered

line of thecorrespondingpoint. If theuserdragswith theright mousebuttoninsteadof theleft button,thenthepointon :�� nearestto :4� is chosen.In thiscase,theusermaymoveapoint freely, andthecorrespondingpoint is constrainedto lie on thenew epipolarline.

Theusercreatesa line correspondencesimply by specifyingtwo existing pointsastheendpoints.Lineconstraintsareusefulfor restrictingmeshtopology.

In additionto the texturedrenderingmode,we displaya differenceimage(Figure4). The differenceimageis theabsolutevalueof differencebetweentheleft sourceimagerenderedontothemeshandtherightimagerenderedontothemesh.Poorly-matchedpixelsarelight andgoodmatchesaredark.With this image,theusercantry to manipulatematchingpointsin orderto “black out” thedifferenceimage. This methodis very useful in texturedareaswithout many discontinuities.However, the differenceimageis often aninsufficientmeasureof amatch,especiallyin areaswith little texture;differenceimageeditingmustbeusedin conjunctionwith thestandardeditingmodes.

4 Discussion and Future Work

We have presentedan approachto reconstructingarbitrary3D scenesfrom models,by exploiting humanimageunderstandingtogetherwith computationalpower. We believe that this framework canpotentiallymake 3D reconstructioncheapandeffective for awidevarietyof applications.

Thereareseveralimportantextensions.

� Supportmany exampleimages,to allow reconstructionof largeenvironments.

� Provideastandard3D modelinginterfacefor directmodificationof thegeometryandtopologyof the3D model.For instance,theability to markdiscontinuitiesin themeshis critical.

� Provide interactive modificationof thecameracalibration.

� Includelocalstereocomputationsduringinteractive updates.For example,surfacescould“stiffen” inproportionto thelevel-setevolution termof [5].

For eachof theseextensions,theunderlyingmathematicsarestraightforward, but theuserinterfaceisnot.

Acknowledgments

Delaunaytriangulationswerecomputedusing [11]. The headimagepair wasobtainedfrom [13] andiscopyright c

>INRIA-Syntim. (Wedid notmake useof thecalibrationdataprovidedfor thesourceimages.)

Theauthoris supportedby NSFgrantDGE-9454173.

References

[1] ShaiAvidan,AmnonShashua.Novel View Synthesisby CascadingTrilinear Tensors.IEEE Transac-tions on Visualization and Computer Graphics. Vol. 4, No. 4, October/December1998

4

Page 5: Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of difference between the left source image rendered onto the mesh and the right image rendered

[2] LuciaDarsa,BrunoCosta,andAmitabhVarshney. NavigatingStaticEnvironmentsUsingImage-SpaceSimplificationandMorphing. ACM Symposium on Interactive 3D Graphics, Providence,RI, 1997,pp.25–34

[3] Paul E. Debevec,Camillo J. Taylor, JitendraMalik. ModelingandRenderingArchitecturefrom Pho-tographs:A Hybrid Geometry-andImage-BasedApproach.SIGGRAPH96 ConferenceProceedings.August1996.pp.11–20.

[4] KennethE. Hoff, UnderstandingProjective Textures:CorrectingtheDouble-Perspective Distortion.http://www.cs.unc.edu?A@ hoff/techrep/index.html

[5] Olivier Faugeras,RenaudKeriven.CompleteDenseStereovisionusingLevel SetMethods.Proc. ECCV’98.

[6] LeonardMcMillan, GaryBishop.Plenopticmodelling:animage-basedrenderingsystem.SIGGRAPH95, (August95) pp.39–46.

[7] Frederic Pighin,JamieHecker, Dani Lischinski,RichardSzeliskiandDavid H. Salesin.SynthesizingRealisticFacial Expressionsfrom Photographs.SIGGRAPH98 ConferenceProceedings.July, 1998.pp.75–84.

[8] Luc Robert, Stephane Laveau. Online 3-D Modeling with images.http://www-sop.inria.fr/robotvis/projects/Realise/java

[9] Steve SeitzandCharlesDyer, View Morphing,in Proc.SIGGRAPH-96,pp.21-30,1996.

[10] JonathanShewchuk.Triangle:Engineeringa 2D Quality MeshGeneratorandDelaunayTriangulator.First Workshop on Applied Computational Geometry. p 124-133,1996.

[11] JonathanShewchuk.Triangle:A Two-DimensionalQualityMeshGenerator.http://www.cs.cmu.edu/B quake/triangle.html

[12] Heung-Yeung Shum, Mei Han, Richard Szeliski. Interactive Constructionof 3D Models fromPanoramicMosaics.IEEE CVPR, pages427-433,SantaBarbara,June1998.

[13] Jean-PhillippeTarel.Stereogramsof theSyntimproject.http://www-syntim.inria.fr/syntim/analyse/paires-eng.html

[14] RandallWarniers.EveryPictureTellsA Story. Computer Graphics World. October1998.

5

Page 6: Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of difference between the left source image rendered onto the mesh and the right image rendered

(a)

(c)

(b)

(d)

Figure2: (a),(b)Hand-marked featurecorrespondences,without interactive reprojection.(c) Reprojectionof featurematches.Featurecorrespondencesthatlook reasonableon theimageplaneoftenproducegrosslyincorrect3D models.(d) Interactive adjustmentsto thematchingcanvastlyimprove the3D reconstruction.

6

Page 7: Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of difference between the left source image rendered onto the mesh and the right image rendered

(a)

(c)

(b)

(d)

Figure3: (a),(b)Fruit stereopair. (c),(d)Modelgeneratedfrom fruit stereopair.

7

Page 8: Interactive 3D Scene Reconstruction from ImagesThe difference image is the absolute value of difference between the left source image rendered onto the mesh and the right image rendered

Figure4: Differencebetweensourceimagestexturedon thesamemesh.Most errorsappearneardisconti-nuites,whicharenotyethandledby oursystem.

8