Deep Compression for Streaming Texture Intensive...

7
Deep Compression for Streaming Texture Intensive Animations Daniel Cohen-Or Tel Aviv University Yair Mann Webglide Ltd. Shachar Fleishman Tel Aviv University Abstract This paper presents a streaming technique for synthetic texture in- tensive 3D animation sequences. There is a short latency time while downloading the animation, until an initial fraction of the com- pressed data is read by the client. As the animation is played, the remainder of the data is streamed online seamlessly to the client. The technique exploits frame-to-frame coherence for transmitting geometric and texture streams. Instead of using the original tex- tures of the model, the texture stream consists of view-dependent textures which are generated by rendering offline nearby views. These textures have a strong temporal coherency and can thus be well compressed. As a consequence, the bandwidth of the stream of the view-dependent textures is narrow enough to be transmitted together with the geometry stream over a low bandwidth network. These two streams maintain a small online cache of geometry and view-dependent textures from which the client renders the walk- through sequence in real-time. The overall data transmitted over the network is an order of magnitude smaller than an MPEG post- rendered sequence with an equivalent image quality. Keywords: compression, MPEG, streaming, virtual environment, image-based rendering 1 Introduction With the increasing popularity of network-based applications, com- pression of synthetic animation image sequences for efficient trans- mission is more important than ever. For long sequences even if the overall compression ratio is high, the latency time of downloading the compressed file might be prohibitive. A better network-based compression scheme is to partition the compressed sequence into two parts. The first part, or header, is small enough to be down- loaded within an acceptable initialization time, while the second part is transmitted as a stream. The compressed data is broken up into a stream of data which is processed along the network pipeline, that is, the compressed data is transmitted from one end, and re- ceived, decoded and displayed at the other end. Streaming necessar- ily requires that all the pipeline stages operate in real-time. Clearly, the network bandwidth is the most constrained resource along the pipeline and thus the main challenge is to reduce the stream band- width enough to accommodate the network bandwidth constraint. Standard video compression techniques that take advantage of frame-to-frame coherency, are insufficient for streaming. Given a moderate frame resolution of , an average MPEG frame is typically about 2-6K bytes. Assuming a network with a sustained transfer rate of 2K bytes per second, a reasonable quality of a few frames per second cannot be achieved. A significant improvement in compression ratio is still necessary for streaming video in real- time. Synthetic animations have higher potential to be compressed than general video since more information is readily available for the compression algorithms [2, 12, 22]. Assuming, for example, Computer Science Department, Tel-Aviv University 69978, Israel. E-mail: daniel, shacharf @math.tau.ac.il Webglide Ltd. E-mail [email protected] that the geometric model and the animation script are relatively small, one can consider them as the compressed sequence, trans- mit them and decode them by simply rendering the animation se- quence. By simultaneously transmitting the geometry and the script streams on demand, one can display very long animations over the network (see [19]). However, if the model consists of a large num- ber of geometric primitives and textures, simple streaming is not enough. The texture stream is especially problematic since the tex- ture data can be significantly larger than the geometric data. For some applications, replicated (tiled) textures can be used to avoid the burden of large textures. However, realistic textures typically re- quire a large space. Moreover, detailed and complex environments can effectively be represented by a relatively small number of tex- tured polygons [3, 13, 17, 20, 6, 21, 11]. For such texture intensive models the geometry and the animation script streams are relatively small, while the texture stream is prohibitively expensive. Indeed, the performance of current VRML browsers is quite limited when handling texture intensive environments [8]. In this paper we show that instead of streaming the textures, it is possible to stream view-dependent textures which are created by rendering nearby views. These textures have a strong temporal coherency and can thus be well compressed. As a consequence, the bandwidth of the stream of the view-dependent textures is nar- row enough to be transmitted together with the geometry and the script streams over a low bandwidth network. Our results show that a walkthrough in a virtual environment consisting of tens of megabytes of textures can be streamed from a server to a client over a network with a sustained transfer rate of 2K bytes per second. In terms of compression ratios, our technique is an order of magnitude better than an MPEG sequence with an equivalent image quality. 2 Background Standard video compression techniques are based on image-based motion estimation [9]. An MPEG video sequence consists of in- tra frames (I), predictive frames (P) and interpolated frames (B). The I frames are coded independently of any other frames in the sequence. The P and B frames are coded using motion estimation and interpolations, and thus, they are substantially smaller than the I frames. The compression ratio is directly dependent on the success of the motion estimation for coding the P frames. Using image- based techniques the optical flow field between successive frames can be approximated. It defines the motion vectors, or the corre- spondence between the pixels of successive frames. To reduce the overhead of encoding the motion information, common techniques compute a motion vector for a block of pixels rather than for indi- vidual pixels. For a synthetic scene the available model can assist in computing the optical flow faster or more accurately [2, 22]. These model-based motion estimation techniques improve the compres- sion ratio, but the overhead of the block-based motion approxima- tion is still too high for streaming acceptable image quality. A different approach was introduced by Levoy [12]. Instead of compressing post-rendered images, it is possible to render the ani- mation on-the-fly at both ends of the communication. The render- ing task is partitioned between the server (sender) and client (re- ceiver). Assuming the server is a high-end graphics workstation, it

Transcript of Deep Compression for Streaming Texture Intensive...

Page 1: Deep Compression for Streaming Texture Intensive Animationsshachar/Publications/CohenOrSig99.pdfrendered sequence with an equivalent image quality. Keywords: compression, MPEG, streaming,

Deep Compression for Streaming Texture Intensive Animations

DanielCohen-Or�

Tel Aviv UniversityYair Mann

�WebglideLtd.

ShacharFleishman�

Tel Aviv University

Abstract

This paperpresentsa streamingtechniquefor synthetictexture in-tensive3D animationsequences.Thereis ashortlatency timewhiledownloading the animation,until an initial fraction of the com-presseddatais readby theclient. As theanimationis played,theremainderof the datais streamedonline seamlesslyto the client.The techniqueexploits frame-to-framecoherencefor transmittinggeometricand texture streams. Insteadof using the original tex-turesof the model, the texture streamconsistsof view-dependenttextures which are generatedby renderingoffline nearbyviews.Thesetextureshave a strongtemporalcoherency andcanthusbewell compressed.As a consequence,thebandwidthof thestreamof theview-dependenttexturesis narrow enoughto betransmittedtogetherwith thegeometrystreamover a low bandwidthnetwork.Thesetwo streamsmaintaina smallonlinecacheof geometryandview-dependenttexturesfrom which the client rendersthe walk-throughsequencein real-time. The overall datatransmittedoverthenetwork is anorderof magnitudesmallerthananMPEGpost-renderedsequencewith anequivalentimagequality.

Keywords: compression,MPEG,streaming,virtual environment,image-basedrendering

1 Introduction

With theincreasingpopularityof network-basedapplications,com-pressionof syntheticanimationimagesequencesfor efficient trans-missionis moreimportantthanever. For longsequencesevenif theoverall compressionratio is high, the latency time of downloadingthe compressedfile might be prohibitive. A betternetwork-basedcompressionschemeis to partition the compressedsequenceintotwo parts. The first part, or header, is small enoughto be down-loadedwithin an acceptableinitialization time, while the secondpart is transmittedasa stream. Thecompresseddatais broken upinto astreamof datawhichis processedalongthenetwork pipeline,that is, the compresseddatais transmittedfrom oneend,andre-ceived,decodedanddisplayedattheotherend.Streamingnecessar-ily requiresthatall thepipelinestagesoperatein real-time.Clearly,thenetwork bandwidthis the mostconstrainedresourcealongthepipelineandthusthemainchallengeis to reducethestreamband-width enoughto accommodatethenetwork bandwidthconstraint.

Standardvideo compressiontechniquesthat take advantageofframe-to-framecoherency, areinsufficient for streaming.Given amoderateframeresolutionof ������������� , anaverageMPEGframeis typically about2-6K bytes.Assuminganetwork with asustainedtransferrateof 2K bytespersecond,a reasonablequality of a fewframespersecondcannotbeachieved. A significantimprovementin compressionratio is still necessaryfor streamingvideo in real-time.

Syntheticanimationshave higher potential to be compressedthangeneralvideo sincemoreinformationis readily availableforthe compressionalgorithms[2, 12, 22]. Assuming,for example,

ComputerScienceDepartment,Tel-Aviv University69978,Israel.E-mail: daniel,shacharf� @math.tau.ac.il�

WebglideLtd. E-mail [email protected]

that the geometricmodel and the animationscript are relativelysmall, onecanconsiderthemasthe compressedsequence,trans-mit themanddecodethemby simply renderingthe animationse-quence.By simultaneouslytransmittingthegeometryandthescriptstreamson demand,onecandisplayvery long animationsover thenetwork (see[19]). However, if themodelconsistsof a largenum-ber of geometricprimitivesand textures,simplestreamingis notenough.Thetexturestreamis especiallyproblematicsincethetex-ture datacanbe significantly larger thanthe geometricdata. Forsomeapplications,replicated(tiled) texturescanbe usedto avoidtheburdenof largetextures.However, realistictexturestypically re-quirea largespace.Moreover, detailedandcomplex environmentscaneffectively berepresentedby a relatively smallnumberof tex-turedpolygons[3, 13,17, 20,6, 21,11]. For suchtexture intensivemodelsthegeometryandtheanimationscriptstreamsarerelativelysmall,while the texturestreamis prohibitively expensive. Indeed,theperformanceof currentVRML browsersis quite limited whenhandlingtextureintensive environments[8].

In this paperweshow thatinsteadof streamingthetextures,it ispossibleto streamview-dependent textures which are createdbyrenderingnearbyviews. Thesetextures have a strong temporalcoherency and can thus be well compressed.As a consequence,thebandwidthof thestreamof theview-dependenttexturesis nar-row enoughto be transmittedtogetherwith the geometryandthescript streamsover a low bandwidthnetwork. Our resultsshowthat a walkthroughin a virtual environmentconsistingof tensofmegabytesof texturescanbestreamedfrom aserver to aclientovera network with a sustainedtransferrateof 2K bytespersecond.Intermsof compressionratios,our techniqueis anorderof magnitudebetterthananMPEGsequencewith anequivalentimagequality.

2 Background

Standardvideocompressiontechniquesarebasedon image-basedmotion estimation[9]. An MPEG video sequenceconsistsof in-tra frames(I), predictive frames(P) and interpolatedframes(B).The I framesarecodedindependentlyof any other framesin thesequence.TheP andB framesarecodedusingmotionestimationandinterpolations,andthus,they aresubstantiallysmallerthantheIframes.Thecompressionratio is directlydependenton thesuccessof the motion estimationfor coding the P frames. Using image-basedtechniquesthe optical flow field betweensuccessive framescanbe approximated.It definesthe motion vectors,or the corre-spondencebetweenthepixelsof successive frames.To reducetheoverheadof encodingthemotioninformation,commontechniquescomputea motionvectorfor a block of pixels ratherthanfor indi-vidualpixels.For asyntheticscenetheavailablemodelcanassistincomputingtheopticalflow fasteror moreaccurately[2, 22]. Thesemodel-basedmotion estimationtechniquesimprove the compres-sionratio, but theoverheadof theblock-basedmotionapproxima-tion is still toohigh for streamingacceptableimagequality.

A differentapproachwasintroducedby Levoy [12]. Insteadofcompressingpost-renderedimages,it is possibleto rendertheani-mationon-the-flyat bothendsof thecommunication.Therender-ing task is partitionedbetweenthe server (sender)andclient (re-ceiver). Assumingtheserver is a high-endgraphicsworkstation,it

Page 2: Deep Compression for Streaming Texture Intensive Animationsshachar/Publications/CohenOrSig99.pdfrendered sequence with an equivalent image quality. Keywords: compression, MPEG, streaming,

canrenderhighandlow quality images,computetheresidualerrorbetween them,and transmitthe residualimagecompressed.Theclient needsto renderonly the low quality imagesandto addthetransmittedresidualimagesto restorethe full quality images. Itwasshown thattheoverall compressionratio is betterthanconven-tional techniques.

If thistechniqueis to beadaptedfor streamingof theresidualim-ages,it wouldrequirepre-computingtheresidualimagesanddown-loadingtheentiremodelandanimationsequencebeforestreamingcantake place.It wouldbepossibleto transmitonly key-frameim-agesandblend(or extrapolate)the in-betweenframes[4, 14, 15].However, it is not clearhow to treat texture intensive animations,sincedownloadingthe texturesis still necessaryto keeptheresid-ual imagessmall enoughfor streaming.Other relatedtechniquesuseimposters [17, 20] andsprites [18, 11] as image-basedprimi-tivesto rendernearbyviews. Whenthequalityof a primitivedropsbelow somethreshold,it is recomputedwithout exploiting its tem-poralcoherence.

3 View-dependent Texture Streaming

Assume an environment consisting of a polygonal geometricmodel, textures,anda numberof light sources.Furthermore,as-sumethat the environment is texture-intensive, that is, the sizeof the environmentdatabaseis dominatedby the textures,whilethegeometry-spaceis significantlysmallerthanthe texture-space.Nevertheless,the sizeof theenvironmentis too large to bedown-loadedfrom theserver to theclient in anacceptabletime.

Streamingtheenvironmentrequirestheserverto transmitthean-imationscriptand,accordingto thecameraviewing parameters,totransmitthe visible partsof the model. However, the sizeof thetexturesnecessaryfor a singleview is too large to be streamedinreal-time. Insteadof usingtheseoriginal textures,we show that itis betterto usenearbyviews astextures. Theseviews canbe re-gardedasview-dependenttextureswhich areeffectively valid onlyfor texturing thegeometryviewedfrom nearbyviewing directions.

Given the currentcameraposition, the client generatesa newframebasedonthedatastreamedsofar, which includesat leastthevisiblepolygonsfrom thecurrentcamerapositionanda numberofnearbyviews. The new frameis a perspective view generatedbyrenderingthe geometrywherethe nearbyperspective views serveastexture maps. Sincethe correspondencebetweentwo perspec-tive views is a projective map,the model is renderedby applyinga projective mappingratherthana perspective mapping.This pro-jective mapcanbe expressedby a linear transformationin homo-geneousspace,andcanbe implementedby rationallinear interpo-lationwhich requiresdivisionsat eachpixel [10, 16].

Nearbyviewsareadvantageousastexturesbecause(i) they havean “almost” one-to-onecorrespondencewith thecurrentview andtherefore,a bi-linearfilter, is sufficient to producequality images,and(ii) they arepost-renderedandmay includevariousglobal il-luminationeffects. Figure1 shows a simpleexampleof a texturedcube.In theseviewstherearethreevisiblepolygonalfaces.Theleftimagewasrenderedwith a standardperspective texture mapping,while theright imagewasrenderedby applyingabackwardprojec-tive texturemappingfrom theleft image.Theprojectionof eachofthecubefaceson theleft imageservesasa view-dependenttexturefor its correspondenceprojectionon theright image.Thesizeandshapeof thesourcetextureis closeto its target.This reducesmanyaliasingproblemsandhencesimplifiesthefiltering. In this exam-ple, for expository reasons,the two views arenot “nearby”, oth-erwisethey would appearalmostindistinguishable.Later we willquantify the meaningof “nearby”. Sincethe view-dependenttex-turesaregeneratedoff-line, they canbe involvedandcanaccountfor many global illumination effects,includingshadows castfromremoteobjects.Thissimplifiesthegenerationof anew frame,since

(a)sourceview (b) mappedview

Figure1: (a) A sourceimageof a texturedcube.(b) A view of thesamecubeobtainedby backmappinginto the sourceimage. Notethat the texture on the leftmostfaceis severely blurred,while theothertwo facesdo notexhibit significantartifacts.

therenderingis independentof thecomplexity of the illuminationmodel.

View-dependenttextures,likeview-independenttexturesaredis-crete,therefore,when resampled,artifactsare introduced. If thetarget areais larger thanthe correspondingsourceareathe resultappearsblurry, ascanbe seenon the leftmost faceof the cubeinFigure 1b. On the other hand, if the target areais smaller thanthe sourcearea(the rightmostface),by appropriatelyfiltering thesourcesamples,thetarget imagehasno noticeableartifacts.Thus,anappropriateview-dependenttextureshouldbe(i) createdascloseas possibleto the new view and (ii) createdfrom a direction inwhich the areais expandedrelative to the target. In Section4we elaborateon the factorwhich controlsthe quality of the view-dependenttextures.

Oneshouldalsoconsiderthevisibility, namelythatat leastpartof a polygoncanbehiddenat thenearbyview, while beingvisibleat thecurrentview. This is a well-known problemin image-basedrendering[4, 5]. However, in a streamingapplicationwherethecompressedsequenceis precomputed,visibility gapsareknown apriori and for eachgap thereis at leastonenearbyview that canclosethat gap. It shouldbeemphasizedthat thenearbyviews arenot restrictedto be from the “past”, but canbe locatedneara “fu-ture” locationof thecamera.

As theanimationproceeds,thequality andvalidity of theview-dependenttextures decreaseas the cameraviewing parametersdeviate. Thus, new closer view-dependenttextures need to bestreamedto guaranteethat eachpolygon can be mappedto anappropriatetexture. The streamof new view-dependenttexturesis temporallycoherentand thus insteadof transmittingthe com-pressedtextures,it is possibleto exploit this texture-to-textureco-herenceto geta bettercompression.Theold texture is warpedto-wardsthe new view andthe residualerror betweenthe new viewandthewarpedimageis computed.Theresidualimagecanthenbebettercompressedthantheraw texture.

A view-dependenttexture is not necessarilya full size nearbyview, but may include only the view of part of the environment.Basedon a per-polygonamortizationfactor(seeSection4), a sub-setof theview-dependenttextures,definedby thepolygonmesh,isupdated.This furtherreducesthesizeof texturestreamto beopti-mal with respectto theamortizationfactor. Thus,a key point is tousea goodtexturequalityestimate.

Page 3: Deep Compression for Streaming Texture Intensive Animationsshachar/Publications/CohenOrSig99.pdfrendered sequence with an equivalent image quality. Keywords: compression, MPEG, streaming,

4 The Texture Quality Factor

Givena view anda setof view-dependenttextures,it is necessaryto selectthe bestquality texture for eachvisible polygon. As canbe seenin Figure1 for somefacesof the cube(in (b)) the view-dependenttexture (in (a)) is more appropriatethan others. Thequality of the sourcetexture is relatedto the areacoveredby theprojectionof eachpolygon, or cubefacein Figure 1, in the twoviews. Thus,a per polygonlocal scaling factor is requiredto es-timatetheareasin thesourcetexture that locally shrinkor expandwhen mappedonto the target image. When the scalingfactor is���

we canconsiderthe sourceimageappropriatefor generatingthe target image. As the factorincreasesabove 1, moreandmoreblur appearsin thetargetimage,andwe shouldconsiderthesourcetexture lessandlessappropriate.This perpolygontexturequalityfactoris usedto selectthe bestsourcetexture out of the availablesetof view-dependenttextures. If the bestsourceis above somepredeterminedthreshold,anew textureis required.However, asuc-cessfulstreamingof thetexturesguaranteesthatthereis alwaysoneavailabletexturewhosequality factoris satisfactory.

Figure 2 illustratesthe per-polygon sourceselectionprocess.Two examplesof nearbyviews areshown in (a) and(b). Thescalefactor for the in-betweenimages(f) is visualizedusing the graylevel images(c) and(d), wherewhite is usedfor low scalefactorandblack is usedfor high scalefactor. The (c) and(d) columnsvisualizethe scalefactorwhenmappingthe texturesfrom (a) and(b) to (f). For eachpolygon,we selectthe texture from the imagethathasthe lowestscalefactor, this is visualizedin (e), wherethered levelsarethescalefactorsfrom (a) andtheblue levels arethescalefactorfrom (b).

We are now left with the problemof estimatingthe maximalvalueof thescalingfactorin a particularpolygonin thesourceim-age,for a giventargetimage(seealso[11]). Notethatthefactorisdefinedindependentlyof thevisibility of thepolygon.We first dis-cussthecasewherethepolygonis mappedfrom thesourceto thetargetby a lineartransformation,andthendiscussthemoregeneralnon-linearcase.

Let � bea squarematrix correspondingto somelineartransfor-mation.Thescalingfactorof a lineartransformationis the2-normof thematrix � , i.e. themaximum2-normof ��� over all unit vec-tors � . ������ � ��� � ��� �"! �#� !%$It can be shown [7] that the 2-norm of � equalsthe squarerootof &"')(+* , the largest eigenvalue of � T � . In the caseof two-dimensionallineartransformations,where� is a 2 by 2 matrix,wecanwrite down a closed-formexpressionfor & ')(%* . Let ,.- / denotetheelementsof � and 0 - / theelementsof � T � :�21 3 , �4� , � $, $ � , $4$65 � T �21 3 0 �4� 0 � $0 $ � 0 $4$65 (1)

The eigenvaluesof the matrix � T � are the roots of the polyno-mial det78� T �:9;&=<?> , where < is the identity matrix. In the two-dimensionalcase,& ')(+* is thelargestrootof thequadraticequation780 �4� 96&@>A780 $4$ 96&@> 9B0 � $ 0 $ � 1DCFE (2)

Thus,& ')(+* 1 0 �4� G 0 $4$ G2H 780 �4�IG 0 $4$ > $ 9BJ=780 �4� 0 $4$ 9K0 � $ 0 $ � >� E(3)

Expressingtheelements0 - / in termsof theelements, - / yields� T �21 3 , $ �4� G , $$ � , �4� , � $ G , $ � , $4$, �4� , � $ G , $ � , $4$ , $ � $ G , $$4$ 5;L (4)

andfinally, defining MB1 �$ 7N, $ �4� G , $ � $ G , $$ � G , $$4$ > , we get&"')(+*O1PM G H M $ 9Q7N, �4� , $4$ 9R, � $ , $ � > $ (5)

Dealing with non-linear transformations,such as projectivetransformations,requiresmeasuringthe scalingfactorlocally at aspecificpoint in the imageby using the partial derivatives of thetransformationat the given point. The partial derivativesareusedasthecoefficientsof a lineartransformation.

Let us denoteby S@T L4U T , and S � L4U � the sourceand target im-agecoordinatesof a point, respectively, andby S LVUWL4X the three-dimensionallocationof thatpoint in targetcameracoordinates.Wehave: 7NS L4U@L4X > T 1ZY , [ \] 0_^`_a bdc 7NS T LVU T L � > T (6)

7NS � L4U � > T 1 �X 7NS LeU > T (7)

or explicitly,S � 1 ,FS@T G [ U T G \` S T G a U T G b U � 1 ] S@T G 0 U T G ^` S T G a U T G b (8)

The partial derivativesof the above mappingat 7NS � L4U � > defineits gradient,which is a lineartransformation:�21 3gf S �ih f S"T f S �ih f U Tf U � h f S T f U � h f U T 5 1 �X $ 3 X ,j9BS � ` X [k9BS � aX ] 9 U � ` X 0l9 U � a 5

(9)In caseswherethe field of view is small, and there is only a

little rotationof the planerelative to the sourceandtarget views,the following approximationcan be used. The transformationoftheplanepointsfrom sourceto target imagecanbeapproximatedby: S � 1m, G [AS T G \ U T G ` S $T G a S T U T (10)U � 1 ] G 0+S@T G ^ U T G ` S@T U T G a U $TThis is calleda pseudo2D projective transformationandis furtherexplainedin [1]. In this casewe get:�21 3 [ G � ` S@T G a U T \ G a S@T0 G ` U T ^ G ` S@T G � a U T 5 (11)

To estimatethe maximalscalingfactor, the gradientcanbe com-putedat thethreeverticesof a triangle. In caseswherethetriangleis smallenough,evenonesample(at thecenterof thetriangle)canyield a goodapproximation.Our experimentshave shown that ascalefactor in the rangeof n � E o L � E J�p , yields high quality imageswhile maintaininghigh compressionratio.

5 Geometry Streaming

We have discussedthe strategy for selecting the best view-dependenttexturefrom thoseavailable.Thesystemguaranteesthatat leastone appropriateview-dependenttexture hasalreadybeenstreamedto theclient. If for somereasonanappropriatetexture isnot availableon time, someblur is likely to be noticed. However,theerroris notcritical. Ontheotherhand,amissingpolygonmightcausea hole in thesceneandthevisualeffect would beunaccept-able.Thus,thegeometrystreamgetsahigherpriority soasto avoidany geometricmiss.

Detecting and streamingthe visible polygons from a givenneighborhoodis not an easyproblem. Many architecturalmodels

Page 4: Deep Compression for Streaming Texture Intensive Animationsshachar/Publications/CohenOrSig99.pdfrendered sequence with an equivalent image quality. Keywords: compression, MPEG, streaming,

areinherentlypartitionedinto cells andportalsby which the cell-to-cellq visibility canbeprecomputed.However, sincethegeometrystreamis computedoffline, thevisiblepolygonscanbedetectedbyrenderingtheentiresequence.Thesetof polygonsvisible in eachframe can be detectedby renderingeachpolygon with ID indexcolor andthenscanningthe framebuffer collectingall the ID’s ofthevisiblepolygons.Thedefinitionof apolygonthatis first visibleis transmittedtogetherwith its ID. During theonlinerenderingtheclient maintainsa list of all thepolygonswhich have beenvisibleso far, anda cacheof the “hot” ID’s, i.e., the polygonswhich areeithervisible in the currentframeor in a nearbyframe. The tem-poral sequenceof thehot ID’s is precomputedandis a part of thegeometricstream. The cacheof hot polygonsservesasa visibil-ity culling mechanism,sincetheclientneedsto renderonly asmallconservativesupersetof thevisiblepolygons,whichis substantiallysmallerthantheentiremodel.

Table1: A comparisonof thestreamingtechnique(VDS) with anMPEGencoding.Theframeresolutionis ��JrC�� �+s C .

animation size RMS(# frames) (header) (0-255) bits/pix bits/secArts (435)VDS 80K (23K) 18.6 0.034 29511MPEG 1600K 19.2 1.61 619168MPEG 200K 31 0.085 73766Gallery(1226)VDS 111K (30K) 19.15 0.0168 14563MPEG 1830K 20.03 0.277 239436MPEG 333K 27.41 0.05 43523Office (1101)VDS 177K (72K) 22.47 0.03 25722MPEG 2000K 22.60 0.336 290644MPEG 337K 34.71 0.056 48937

6 Results and Conclusions

In our implementation the precomputedsequenceof view-dependenttexturesis streamedto the client asynchronously. Thestreamupdatesthe setof view-dependenttexturesfrom which theclient rendersthe animation. The texture updatefrequency is de-fined to guaranteesomereasonablequality. The choice of fre-quency directly effectsthe compressionratio. Reasonablequalityis asubjective jusgement,howeverwewill analyzeit quantitatively.Weusetheroot mean square (RMS)asanobjectivemeasurementtocompareMPEG compressionandour compressiontechnique(de-notedin the following by VDS (View DependenttextureStream))with theuncompressedimages.

Figure3 shows five samplesof thecomparisontakenfrom threesequencesthat were encodedboth in MPEG and VDS. The firsttwo examples,“Arts” and“Gallery”, arefrom two virtual museumwalkthroughs,andthe third example“Office” is of an animation.Thequantitative resultsareshown in Table1. Eachanimationwascompressedby MPEGandVDS; we seethatwhenthey haveaboutthe sameRMS, the sizeof the MPEG is about10-20timeslargerthan the VDS. When MPEG is forced to compressthe sequencedown to asmallersize,it is still 2-3 timeslargerthantheVDS, andits quality is significantlyworse. This is evident from the corre-spondingRMSvaluesin thetableandvisuallynoticeablein Figure3. TheVDS is not freeof artifacts;someareapparentin theimagesin Figure 3. Most of them are due to imperfect implementationof the renderingalgorithm,ratherthanbeingintrinsic to the algo-rithm. Note that thereare slight poppingeffects when a texture

updatechangesthe resolution,which could have beenavoidedbyusinga blendingtechnique.

The texture intensive modelswere generatedwith 3D StudioMax. The virtual museum“Arts”, consistsof 3471polygonsand19.6Megabytesof texture,andthe “Office” animationconsistsof5309polygonsand16.9MMegabytesof 3D StudioMax’s textures,that were not optimizedfor size. Part of the compresseddataistheheader, which includestheinitial datanecessaryto startthelo-cal rendering,that is, the geometry, the animationscript, and thefirst nearbyview in JPEGformat. Thus,the costeffectivenessofstreamingis higherfor longeranimations.The sizeof the headerappearsin parenthesisnearthesizevaluesin Table1.

WeimplementedboththeclientandtheserveronaPentiumma-chine,which wasintegratedin a browserwith ActiveX. Theclientandtheserverwereconnectedby adial-upconnectionof 14400bitspersecond.Thedownloadtimeof theinitial setupdata,namelytheheader, is equivalentto thedownloadtime of a JPEGimage.Thismeansa few secondson a network with a sustainedtransferrateof lessthan 2K bytesper second. We attachedhot-spots(hyper-links) to the3D objectsvisible in theanimation,by which theuserchooseseither high resolutionimagesor precomputedsequenceswhicharestreamedonlineaccordingto his selection.

Theoffline processthatgeneratestheVDS sequencetakesa fewminutesfor animationsof several hundredframesasin the aboveexamples.Currently, theclient’s local renderingis theslowestpartof the real-timeloop. It shouldbeemphasizedthat theclient usesno graphicshardwareto acceleratethe renderingwhich requiresaperspective texturemapping.We believe thatoptimizingtheclientrenderingmechanismwill permit streamingsequencesof higherresolutions. We are planning to extend our systemby allowingsomedegreeof freedomin the navigation, for example,by sup-porting branchingfrom onemovie sequenceto others. Currently,we areworking on applyingcomputer-vision techniquesto recon-structthegeometryof real-world moviesandcompressthemusingour geometry-basedcompression.We believe that the importanceof geometry-basedcompressionis likely to play anincreasingrolein 3D network-basedapplications.

References

[1] G. Adiv, Determiningthree-dimensionalmotion and struc-ture from optical flow generatedby several moving objects.in IEEE Transactions on PAMI, 7(4):384-401,July1985.

[2] M. Agrawala,A. BeersandN. Chaddha.Model-basedmotionestimationfor syntheticanimations.Proc. ACM Multimedia’95.

[3] D. G.Aliaga.Visualizationof complex modelsusingdynamictexture-basedsimplification.Proceedings of Visualization 96,101–106,October1996.

[4] S.E.ChenandL. Williams. View interpolationfor imagesyn-thesis.Computer Graphics (SIGGRAPH ’93 Proceedings),279–288,August1993.

[5] L. DarsaandB. CostaandA. Varshney. Navigatingstaticen-vironmentsusing image-spacesimplificationandmorphing.1997 Symposium on Interactive 3D Graphics, Providence,RhodeIsland;April 28–30,1997.

[6] P.E.Debevec,C.J.TaylorandJ.Malik. ModelingandRender-ing Architecturefrom Photographs:A Hybrid Geometry-andImage-BasedApproach. Computer Graphics (SIGGRAPH’96 Proceedings), 11–20,August1996.

[7] G.H. Golub, C.F. Van Loan. Matrix Computations.Secondedition,pp.56–60.

Page 5: Deep Compression for Streaming Texture Intensive Animationsshachar/Publications/CohenOrSig99.pdfrendered sequence with an equivalent image quality. Keywords: compression, MPEG, streaming,

[8] J. Hardenberg, G. Bell andM. Pesce.VRML: Using 3D tosurf theweb. SIGGRAPH’95course,No. 12 (1995).

[9] D. LeGall. MPEG: A video compressionstandardfor mul-timedia applications.in Communications of the ACM, Vol34(4),April 1991,pp.46–58.

[10] P.S. Heckbertand H.P. Moreton. Interpolationfor polygontexture mappingand shading.State of the Art in ComputerGraphics: Visualization and Modeling. pp.101–111,1991.

[11] JedLengyelandJohnSnyder. Renderingwith CoherentLay-ers,Computer Graphics (SIGGRAPH 97 Proceedings), 233-242,August1997.

[12] M. Levoy. Polygon-assistedJPEGandMPEGcompressionofsyntheticimages.Computer Graphics ( SIGGRAPH ’95 Pro-ceeding ), 21–28,August1995.

[13] P.W.C.MacielandP. Shirley. Visualnavigationof largeenvi-ronmentsusingtexturedclusters.1995 Symposium on Inter-active 3D Graphic, 95–102,April 1995.

[14] Y. Mann and D. Cohen-Or. Selective pixel transmissionfor navigating in remotevirtual environments.in ComputerGraphics Forum (Proceedings of Eurographics’97), Volume16(3),201–206,September1997.

[15] W.R. Mark, L. McMillan, G. Bishop. Post-rendering3Dwarping.Proceedings of 1997 Symposium on Interactive 3DGraphic, Providence,RhodeIsland;April 28–30,1997.

[16] M. Segal,C Korobkin,R. vanWidenfelt,J.ForanandP. Hae-berli.Fastshadowsandlightingeffectsusingtexturemapping.Computer Graphics (SIGGRAPH ’92 Proceedings), 249–252,1992.

[17] G. SchauflerandW. Sturzlinger. A threedimensionalimagecachefor virtual reality. in Computer Graphics Forum (Pro-ceedings of Eurographics’96), Volume15(3),227–235,1996

[18] G. Schaufler. Per-Object ImageWarping with LayeredIm-postors.in Proceedingsof the9thEurographicsWorkshoponRendering’98, Vienna,145–156,June1998.

[19] D. Schmalstieg and M. Gervautz.Demand-driven geometrytransmissionfor distributedvirtual environment.Eurograph-ics ’96, Computer Graphics Forum, Volume15(3),421-432,1996.

[20] J. Shade,D. Lischinski, D.H. Salesin, J. Snyder and T.DeRose.Hierarchical image cachingfor acceleratedwalk-throughsof complex environments.Computer Graphics (SIG-GRAPH ’96 Proceedings),

[21] F. Sillion, G. Drettakis, and B. Bodelet. Efficient impos-tor manipulationfor real-timevisualizationof urbanscenery,in Computer Graphics Forum (Proceedings of Eurograph-ics’97), Volume16(3),207–218,September1997.

[22] D.S. Wallach, S. Kunapalli and M.F. Cohen. AcceleratedMPEG compressionof dynamic polygonal scenes.Com-puter Graphics (SIGGRAPH ’94 Proceedings), 193–197,July1994.

Page 6: Deep Compression for Streaming Texture Intensive Animationsshachar/Publications/CohenOrSig99.pdfrendered sequence with an equivalent image quality. Keywords: compression, MPEG, streaming,

(a) (b)

(c1) (d1) (e1) (f1)

(c2) (d2) (e2) (f2)

(c3) (d3) (e3) (f3)

Figure2: The scalefactor. (a) and(b) are the two nearbyviews, (c) and(d) the per polygonscalefactor in gray level from (a) and(b),respectively, (e) theredlevel andbluelevel polygonsaremappedfrom (a)and(b), respectively, (f) threeinbetweenimages.

Page 7: Deep Compression for Streaming Texture Intensive Animationsshachar/Publications/CohenOrSig99.pdfrendered sequence with an equivalent image quality. Keywords: compression, MPEG, streaming,

(m1) (v1)

(m2) (v2)

(m3) (v3)

(m4) (v4)

(m5) (v5)

Figure3: A comparisonbetweenVDS andMPEG.Theimagesin theleft columnareMPEGframesandin theright columnareVDS frames.Thesizeof MPEGsequenceis aboutthreetimeslargerthantheVDS sequence.