Shadow Matting and Compositing - GRAIL: UW Graphics and...

Shadow Matting and Compositing

Yung-Yu Chuang1 Dan B Goldman1,3 Brian Curless1 David H. Salesin1,2 Richard Szeliski2

1University of Washington 2Microsoft Research 3Industrial Light and Magic

(a) Foreground scene (b) Background scene (c) Blue screen composite (d) Our method (e) Reference photograph

Figure 1 Sample result from our matting and compositing algorithm for shadows. Given a foreground element photographed against a natural background (a),we seek to matte the element and its shadow and then composite it over another background (b). Using a blue screen (not shown) to extract the shadow, followedby conventional matting and compositing, we obtain a result (c) with double darkening of the existing shadow and without proper warping of the cast shadow.The results of our new shadow matting and compositing method (d) compare favorably with an actual photograph (e). Note the correct dimming of the specularhighlight, the convincing geometric deformation, and the seamless matte edges where the foreground and background shadows meet.

AbstractIn this paper, we describe a method for extracting shadows fromone natural scene and inserting them into another. We developphysically-based shadow matting and compositing equations anduse these to pull a shadow matte from a source scene in which theshadow is cast onto an arbitrary planar background. We then acquirethe photometric and geometric properties of the target scene bysweeping oriented linear shadows (cast by a straight object) acrossit. From these shadow scans, we can construct a shadow displace-ment map without requiring camera or light source calibration. Thismap can then be used to deform the original shadow matte. Wedemonstrate our approach for both indoor scenes with controlledlighting and for outdoor scenes using natural lighting.

CR Categories: I.3.3 [Computer Graphics]: Picture/ImageGeneration—Bitmap and framebuffer operations; I.4.8 [Image Pro-cessing and Computer Vision]: Scene Analysis—Shading

Keywords: Blue-screen matting, displacement map, faux shadow,image-based rendering, layer extraction, shadow matte.

1 IntroductionMatting and compositing are important operations in the produc-tion of special effects. These techniques enable directors to embedactors in a world that exists only in imagination, or to revive crea-tures that have been extinct for millions of years. During matting,foreground elements are extracted from a film or video sequence.During compositing, the extracted foreground elements are placedover novel background images.

Traditional approaches to matting include blue-screen mat-ting [Smith and Blinn 1996] and rotoscoping [Wright 2001]. Theformer requires filming in front of an expensive blue screen un-der carefully controlled lighting, and the latter requires talent andintensive user interaction. Recently developed matting algorithms[Ruzon and Tomasi 2000; Chuang et al. 2001] can now pull alphamattes of complex shapes from natural images. Chuang et al. [2002]extended their Bayesian approach to video by interpolating user-drawn keyframes using optical flow.

Shadows provide important visual cues for depth, shape, contact,movement, and lighting in our perception of the world [Petrovicet al. 2000], and thus are often essential in the construction ofconvincing composites. Shadow elements for compositing are typi-cally created either by hand or by extracting them from blue screenplates.

The manual approach is commonly called faux shadow in the filmindustry [Wright 2001]. For this technique, artists use the fore-ground object’s own alpha matte to create its shadow. By warpingor displacement-mapping the shadow, it can be made to drape oversimple objects in the background plate. However, this approach hasseveral limitations. First, an alpha matte is a flat projection of theobject from the point of view of the camera that filmed it. If theview from the light is too far from the camera’s point of view, thesilhouette of the alpha matte may be noticeably different from thesilhouette of the correct shadow, and the resulting synthetic shadowwill be unconvincing. Second, the shadow color characteristics aremanually adjusted by the compositor and do not necessarily matchthe shadow characteristics of the real scene. Most importantly, thisapproach becomes unwieldy for casting shadows on backgroundswith highly complex geometry.

The second main approach is to extract shadows from the fore-ground plates using luma keying or blue-screen matting. Thesetechniques provide a better approximation to the correct shadowcharacteristics. However, depending on the compositing modelused, it may be difficult to obtain photometrically realistic results.For example, in Figure 1(c), note that the blue screen compositegives a noisy shadow with the wrong density, and it creates a dou-ble shadow where the ground plane was already in shadow.

Regardless of the shadow extraction method, target background

scenes with complex geometry present special compositing chal-lenges. In many cases, a rough model must be built so the actorscast shadows onto the model as they would onto the target scene.This model may be a physical blue screen model onto which theactor casts his real shadow for extraction, or a computer-generatedvirtual model onto which faux shadows are cast using a renderer.In either case, it is often difficult to construct a model that matchesthe target object exactly, so additional manual warping and roto-scoping is required to align the transferred shadows to the receivinggeometry.

In this paper, we introduce a new process for shadow matting andcompositing that captures all of these effects realistically. We de-velop a physically-motivated shadow compositing equation, and de-sign a matting and compositing process based on this equation. Forshadow matting, we extract a shadow density map to describe thedegree to which each pixel is in shadow. In contrast to previousapproaches, our matting method works for natural backgrounds.For shadow compositing, we use an active illumination approachto extract an illumination map and a displacement map for the des-tination scene. These maps describe the shadow appearance anddistortions over the novel target background. We recover the dis-placement map without requiring the calibration of the camera orthe position of the light source, using an arbitrarily textured planarreference region. Using these acquired maps, we can realisticallytransfer shadows from one scene to another. Our method imposescertain restrictions on the lighting, camera placement, and at leastsome of the geometry in the source and target scenes, which wediscuss when evaluating the merits and limitations of our method.

Figure 1 shows an overview of our approach and compares toblue screen matting and compositing and to ground truth. Ourmethod correctly occludes both diffuse illumination and specularhighlights, retains soft shadow edges, warps shadows convincinglyacross arbitrary background geometry, and seamlessly blends newlyintroduced shadows with those already present in the backgroundplate.

1.1 Related workMatting and compositing have a long history in the film industry.Traditional approaches require a single-color background, hencethe name blue screen matting [Smith and Blinn 1996]. Recently,several matting algorithms have been developed to extract mattesfrom natural backgrounds for images [Ruzon and Tomasi 2000;Chuang et al. 2001] and image sequences [Chuang et al. 2002]. Ourmethod extends these approaches with a method to extract shadowmattes from natural image sequences.

Much research has been done on shadows in both the vision andgraphics communities. For example, Finlayson et al. [2002] haveattempted to remove shadows from a single image, and Pellaciniet al. [2002] implemented a user interface for adjusting shadowplacement using direct manipulation. However, none of this re-search deals directly with shadow matting and compositing.

The intrinsic image approach [Weiss 2001] attempts to decomposea video of an object under different illumination conditions into areflectance map and a sequence of light maps. These light mapscould be used as shadow mattes, though Weiss does not attemptto transfer them to other scenes. Matsushita et al. [2002] attemptto morph shadows for different lighting conditions using Weiss’sapproach. They capture the lightfield of a static scene with sev-eral light sources. With multiple cameras, they recover a view-dependent model using multiview stereo. A shadow mask is ob-tained by simply thresholding the illumination maps. Finally, theshadow for a novel light source is obtained by warping the shadowmasks of neighboring samples and the estimated geometry.

Petrovic et al. [2000] also estimate 3D geometry for casting shad-ows, but with different goals and methods. Their method is intended

to create shadow mattes for cel animation, so their estimates of 3Dgeometry can be somewhat more approximate. They create a 3Dmodel for the scene by inflating the character and estimating sim-ple geometry for the background from user gestures.

Researchers have developed a number of shape-from-shadow tech-niques. For example, Savarese et al. [2001] observe the self-shadowing of an object under different lighting conditions andcarve a model to generate a plausible solution to match those ob-servations. Our method builds on the shadow scanning approach ofBouguet et al. [1998]. However, we avoid explicitly reconstructinga 3D geometric model, which requires calibration of the cameraand light sources. Instead, we estimate the displacement map in theimage space directly.

Our compositing method is similar to Debevec’s differentialrendering approach for compositing synthetic objects into realscenes [1998]. Debevec records the differences between the ren-dered radiances with and without the synthetic objects in the scene.The differences are then added to the real destination backgroundto make the composite. To avoid explicitly modeling the geometryand BRDF of the destination scene, we take a different approach.

Our approach also has some interesting similarities to environmentmatting and compositing [Zongker et al. 1999; Chuang et al. 2000].Both lines of research attempt to capture lighting phenomena thatare modeled incorrectly by the traditional compositing equation.Both use an active illumination approach to capture the informationrequired for realistic composites. Our method casts oriented stickshadows (Section 4), whereas high-accuracy environment mattinguses oriented Gaussian stripes. Our warping function plays a similarrole to the warping function for single-frame environment matting.

1.2 OverviewIn the following sections, we first develop our shadow mattingequation and shadow matting algorithm for scenes with identi-cal source and destination background geometry (Section 2). Wethen describe shadow compositing onto simple (planar) destinationbackgrounds (Section 3) and shadow warping for more geomet-rically complex backgrounds (Section 4). We present results us-ing our technique in Section 5. Finally, we discuss the limitations,working range, and pros and cons of our approach (Section 6), andconclude with a summary and ideas for future research.

2 Shadow mattingIn this section, we develop our shadow compositing equation anddescribe our algorithm to estimate shadow mattes.

Traditionally, matting and compositing operations are based on thecompositing equation [Porter and Duff 1984],

C = αF + (1− α)B. (1)

The composite color C is a linear combination of the foregroundcolor F and the background color B weighted by the opacity α ofthe foreground object. This color blending model is effective forcapturing partial fill and motion blur effects. Some of the previousapproaches for shadow matting assume this model and try to deter-mine α and F for the shadow [Wright 2001]. This approach effec-tively represents the shadow as a translucent dark layer. However,the standard compositing model does not hold for shadows becauseshadows are caused by the occlusion of light sources rather thancolor blending.

2.1 The shadow compositing equationTo determine an appropriate model for shadow compositing, wetreat the problem within a simplified lighting framework. In partic-ular, we assume that a single, primary point light source (sometimescalled a key light) is responsible for dominant cast shadows within

a scene. The remaining, secondary lighting is either dim or of suchwide area (as in sky illumination) that its cast shadows are compar-atively negligible. If we further assume no interreflections, then wecan model the observed color C at a pixel as:

C = S + βI, (2)

where S is the shadowed color, I is the reflected contribution ofthe primary light source, and β is the visibility to that light source.Let L be the color of a pixel when not in shadow. Substituting I =L − S into Equation (2) and rearranging, we obtain the shadowcompositing equation,

C = βL + (1− β)S. (3)

Equation (3) can be thought of in terms of images or layers. In thiscase, S is the shadow image, L is the lit image, and β is the shadowdensity matte (or just shadow matte) representing the per-pixel vis-ibility of the light source casting the shadow. Note that β may befractional. Such values represent the same phenomena as fractionalα values, including motion blur and partial coverage. Fractional βalso allows us to simulate penumbral effects. (However, for non-point light sources, our model is only an approximation. This limi-tation is discussed in more depth in Section 6.)

The lit and shadow images L and S depend on the lighting condi-tions and albedos of the source scene. Hence, they are not directlytransferable from scene to scene, unlike the shadow matte β, whichis what we estimate during the matting process. For compositing,we therefore require two new images: the lit and shadow images,L′ and S′, of the new (destination) scene (Section 3).

2.2 Estimating the shadow matteDuring the matting process, given the observed color C, we needto recover the shadow matte β. We therefore first need to estimatethe shadow image S and the lit image L. We assume the image se-quence is taken from a static camera with a static background. Wecan estimate the lit image and the shadow image using max andmin compositing [Szeliski et al. 2000], i.e., finding the darkest andbrightest value at each pixel. As Chuang et al. [2002] did for smoke,we first use the video matting algorithm to extract the mattes of theforeground objects and exclude them from the max/min composit-ing. For each pixel, we then compute

S = minf

Cf and L = maxf

Cf , (4)

where the min and max are computed across all frames f inde-pendently at each pixel. Given the color images C, L, and S, whichcan be thought of as 3-vectors at each pixel, we estimate the shadowmatte β using

β =(C − S) · (L− S)

‖L− S‖2 . (5)

This equation simply projects the observed color C onto the colorline between L and S and computes the parametric distance of theprojection along that line. (It is also the least squares estimate of βgiven a noisy color C.)

The method works quite well where we have good estimates forboth L and S. However, the β estimates are noisy where L and Sare similar. This happens wherever some part of the backgroundis never covered by the shadow or always lies in shadow. WhereL = S we cannot recover β at all, but this signifies that the pixel iscompletely unaffected by the presence of shadow, so we mark thesepixels as unknown β. Small areas with unknown β can be filledin using inpainting [Bertalmio et al. 2000] or other hole-filling ap-proaches, although such hole-filling was not necessary for the ex-amples in this paper. Figure 3 illustrates the min and max composite

and recovered β for a sample input. Note that our shadow composit-ing equation is derived in radiance space. In practice, we used thegamma-corrected pixel values directly but did not observe any vis-ible artifacts.

3 Shadow compositingTo perform shadow compositing, we require the lit and shadow im-ages L′ and S′ corresponding to the novel background scene. Weassume that the novel background is captured with a static cam-era whose relation to the primary light source is the same as it wasin the source scene. Equation (3) can then be used to calculate thecomposite color due to shadowing as a function of these two imagesas

C′ = βL′ + (1− β)S′. (6)

For a synthetic scene, it is easy to render both the lit and the shad-owed versions of the scene. For a natural scene, we perform photo-metric shadow scanning by moving an object between the light andscene such that every part of the scene that we wish to compositeshadows into is in shadow at some time. As in the shadow mattingprocess previously described, we then use max/min compositingoperations to recover the shadow and lit images corresponding tothe scene.

With the recovered photometric parameters and source shadowmatte β, we use the shadow compositing equation to make the com-posite as shown in Figure 4(a–c). The most noticeable flaw in thisimage is that the cast shadow does not conform to the geometryof the destination background scene as it should. We address thisshortcoming in the following section.

4 Estimating shadow deformationsIn this section, we show how to transfer a shadow cast on a sourceplanar surface onto a target background with arbitrary geometry,assuming that some region of the target background has a planarregion matching the source planar surface. To accomplish this, weconstruct a displacement or warping map W that places each pixelp in the target image in correspondence with some point W [p] inthe source image. Such a displacement map is sufficient becauseshadows cast from point light sources can be described using a 2Dmap and projection matrix, and the projected image of this maponto any geometry is some distortion of the map.

We use the same shadow compositing equation as in (6) but with awarped β′ = β[W [p]]. Since the values of the displacement mapW [p] are not constrained to integer image coordinates, we use bi-linear interpolation when resampling the shadow map β.

Our method for estimating the shadow displacement map is basedon Bouguet’s shadow scanning approach [1998]. Shadow scanningis an active illumination method, and as such its reconstructionscontain gaps wherever surfaces are occluded from the view of eitherthe light source or camera. Fortunately, in the context of shadowcompositing, these gaps coincide exactly with regions of the im-age unaffected by shadows and with regions not visible to the cam-era. Therefore, shadow scanning is ideally suited for our applica-tion. Furthermore, we can avoid Bouguet’s calibration of the cam-era and light source by not doing full 3D reconstruction. Instead,we perform multiple passes with different scan orientations in or-der to compute a 2D shadow warping function. We call this processgeometric shadow scanning to distinguish it from the photometricshadow scanning described in the previous section, which recoversno geometric (deformation) properties of the scene.

As in Bouguet’s work, we require that our target background in-clude some planar region, which is specified by the user via a hand-drawn garbage matte (Figure 4(d)). We refer to this region as the ref-erence plane region and denote the 3D plane defined by the points

PQ

V π

ls,f=(ms,f, bs,f )

Imax

Imint

C

ts

cs(a)

(b) (c)

(d) Temporal analysis

(e) Shadow line fitting

(f) Outlier rejection

(g) Outlier replacement

ls,f

ls,f-1 ls,f+1ls,f’

ls,f’ls,f-1 ls,f+1

ls,f

Figure 2 Illustration of the principle and details of geometric shadow scanning. Point Q is the point on the reference plane π that lies behind point P along ray V

(a). It is found at the intersection of two shadow lines whose orientation is estimated from the reference planar region, outlined in light yellow. For reference, the pairof images below (b,c) shows 3D renderings of the two individual shadow lines. We estimate all shadow edge crossings using temporal analysis (d) to identify foreach pixel p the shadow time ts[p], computed as the first time the pixel color goes below a threshold halfway between its min and max values, Imin and Imax. Inthe reference plane region, we can then determine the shadow line (e) for frame f of scan s, ls,f = (ms,f , bs,f ), by linearly interpolating between neighboringpixels with shadow times less than and greater than f (shown as black and white dots, respectively) and then fitting a line through the interpolated points. In somecases, line fits are poor due to spurious shadow time samples used in fitting. In the outlier rejection step (f), we identify lines with high error (the green dotted linels,f ), discard samples outside the region defined by the nearest valid shadow lines on both sides (the cyan region between ls,f−1 and ls,f+1), and then re-fit theline (the green solid line l′s,f ). Inconsistent shadow lines may still occur when fitting to too few or closely spaced samples. In the outlier replacement step (g), weidentify these lines (e.g. , the green dotted line ls,f ) by detecting rapid changes in line slopes between frames and then replace them by interpolating neighboringline coefficients. The solid green line l′s,f is the replacement computed by interpolating between ls,f−1 and ls,f+1.

in this region as π (Figure 2(a)). Consider a pixel p through whichwe can see a 3D point P in the target background. For such a pointP , there is a projection Q on the reference plane that lies at the in-tersection of π and the shadow ray V that passes through P . Nowconsider a stick or rod that casts a shadow on P . By observing theshadow line on the reference plane, we know that Q must lie alongthis line. If we re-orient the stick such that the shadow still coverspoint P , we obtain a second shadow line on the reference plane thatmust also intersect Q. We can compute Q as simply the intersectionof the two shadow lines on the reference plane. In our implementa-tion, we solve directly for q, the image-space projection of Q, whichis in fact the warping map W [p], by performing all computations inimage coordinates, as described in the rest of this section. (Notethat the displacement between W [p] and p is actually a measure ofprojective depth (parallax) in a plane-plus-parallax formulation of3D reconstruction [Kumar et al. 1994].)

Acquisition. In practice, we use a series of directional scans, so thatwe can interpolate shadow lines for pixels that are not covered by ashadow edge at an integer frame time. Since the shadows may varyin width and the objects receiving the shadow may be very thin,we identify the leading temporal edge of the shadow lines, ratherthan trying to identify spatial shadow edges. Spatial edge detectionis also less reliable in the presence of reflectance variations; thetemporal analysis avoids this problem by focusing on a single pixelat a time.

To simplify the scanning process, a person moves the scanning stickby hand, potentially casting his or her own shadow onto the scene.Our algorithm is designed to ignore any additional moving shadowsin the image frames, as long as the stick’s shadow always appears toone side of the extra shadow. In principle, two passes suffice to findintersections, but we use three to five passes in order to improve thequality of the results and to cover more of the target backgroundobject. Figure 4(d) shows a frame from a geometric shadow scan-ning sequence. The user-specified reference plane region is shownwith hatched blue lines.

Algorithm. Here is an outline of our algorithm, with details abouteach boldfaced step in the paragraphs that follow.

1. For each directional scan s:

(a) Label each pixel location p with the continuous timets[p] at which it is first crossed by the shadow edge us-ing temporal analysis.

(b) For each frame f , fit a shadow line equation x =ms,fy + bs,f to the points r in the reference plane re-gion labeled with time ts[r] = f .

(c) Perform outlier rejection and replacement on the lineequation parameters (ms,f , bs,f ) to reduce noise.

2. For each pixel location p:

(a) For each scan s, interpolate the line equation parameters(ms,�ts[p]�, bs,�ts[p]�) and (ms,�ts[p]�, bs,�ts[p]�) fromthe nearest integer frames of the scan sequences to ob-tain the line equation parameters (ms,ts[p], bs,ts[p]) forthis pixel.

(b) Compute the intersection q of lines for all scans s.

(c) Store q as the value of the warp function for pixel p:W [p]← q.

3. Optionally, perform anisotropic diffusion to smooth the dis-placement map W [p] while maintaining its structure.

Details. We perform temporal analysis [Bouguet and Perona 1998]to identify the shadow time ts[p] (Figure 2(d)). The person holdingthe scanning stick stays behind the trailing edge of its shadow, sowe find the first frame when the pixel color goes below a thresh-old halfway between its min and max values, Imin and Imax. Welinearly interpolate from the previous frame time to compute theshadow edge time ts[p]. Also, we define the shadow contrast cs[p]as Imax − Imin; this value is later used as a confidence measure.

For shadow line fitting, Bouguet [1998] used spatial analysis tofind points on a planar region crossing the shadow edge and then fitthose points to a line. However, Bouguet’s method assumes that thereference plane is uniform in color and material. Since our goal is

to handle natural scenes with reflectance variations on the referenceplane, we instead use temporal analysis to determine the shadowlines more accurately (Figure 2(e)). For each frame f , we checkeach pixel r within the reference planar region. For every neigh-boring pair of pixels for which the signs of ts[r] − f are different,we estimate the zero-crossing point and add it to a list of candidatepoints on the shadow edge. After finding all such zero-crossings, wefit them to a line using linear regression. This line is the shadow linefor frame f of scan s, parameterized by the tuple (ms,f , bs,f ). (Thisis similar to a 2D version of the marching cubes method [Lorensenand Cline 1987] for tracing isocurves, except we only collect thepoints, and need not connect them into a curve.)Because of noise and bad shadow time estimates in the referenceplane, some shadow lines may be poorly estimated. Accordingly,we perform outlier rejection and replacement to fix poorly fittedlines. For the lines whose fitting errors (as measured by the residual)are larger than some threshold, we reject all the points outside theimage region defined by the nearest valid shadow lines on both sidesand refit the line without those outliers (Figure 2(f)). Even after thisrefitting, some lines may still be poorly estimated, either becausewe have too few points to fit or because the points are too closetogether in the image. Since the orientation of the scanning stickvaries smoothly across the sequence, we identify and reject poorlyfit lines by noting rapid changes in the slopes m between frames.We then replace the rejected shadow line by interpolating the linecoefficients of the neighboring frames (Figure 2(g)). (We have notfound discontinuities in the intercept b values to be of use for outlierrejection, since the scanning may progress at widely varying rates.)After interpolating the shadow lines from integer frames for eachscan, we use weighted least squares to calculate the intersectionpoint q of all the shadow lines corresponding to shadow edgescrossing point p. We determine the weight for each line based ontwo factors. First, if the shadow contrast for the scan is low, the es-timated shadow time ts[p] for that pixel will be less accurate. Sec-ond, if the shadow line was poorly fit, it might have a significantimpact on the location of intersection. In practice, we weight eachline by wcwf , where the first term wc is defined as the square ofthe shadow contrast cs[p]2, and the second term, wf , is defined asexp(−(Es,ts)2), where Es,ts is the interpolated line fitting errorfor scan s at time ts.In a final step, we apply an anisotropic diffusion algorithm tosmooth the displacement map while maintaining its structure [Per-ona and Malik 1990]. Errors in the displacement map are mostsevere for pixels with low shadow contrast. Our diffusion processtherefore flows data from high confidence areas to low confidenceareas. Regions with very low contrast, e.g., areas that were alreadyin shadow or were never shadowed, will be filled with nearby dis-placements that could be quite different from the true displace-ments. However, these areas will essentially be unaffected by shad-ows we want to transfer, so the values of the displacement mapin those regions are, in principle, unimportant (see Section 7 forfurther discussion). In the end, the displacement map effectivelyplaces all pixels on the reference plane from which we extractedthe source shadow matte (Figure 4(e)) to match the geometry of thetarget background (Figure 4(f)).

5 ResultsTo illustrate our method, we filmed some outdoor scenes in un-controlled settings (Figure 3) and transferred the shadows to differ-ent target backgrounds (Figure 4(g,h)). Note that the source sceneswere captured with complexly color-textured backgrounds. Videomatting [Chuang et al. 2002] was employed to extract the matteof the foreground actor. Our outdoor destination backgrounds werescanned with three to five passes of a 96′′× 3′′× 1′′ wooden board(Figure 4(d)). We took care to sweep the scanning stick so as tocover all the regions where the replacement shadow might be cast,

while simultaneously covering a reasonable part of the referenceplane. In all our examples, we used a section of the ground planeas our reference plane, and our outdoor scenes were filmed at hoursof the day when the sun lay low on the horizon, so that an 8-footstick sufficed to scan an adequate region. In addition, we matchedthe camera pose and lighting simply by looking at a source imagewith a shadow on the ground plane, and then “eye-balling” howthe camera should be placed in the target sequence by looking atanother cast shadow.

Figure 5 demonstrates the advantages of our method over previousmethods on a target scene with complex geometry. The surface ofthe ivy is bumpy and challenging to model manually. There is alsosevere self-shadowing in this scene (Figure 5(b)). Using previousmethods, the composite would have double-shadowing in these re-gions. With our method, the shadows deform naturally to match thebackground geometry. Furthermore, there is no double-shadowing(Figure 5(c)). In Figure 6, we transferred an outdoor foreground toan indoor background scene scanned with wooden dowels.

6 DiscussionOur present matting and compositing method has a number of im-portant restrictions. First, our shadow compositing equation (3) isstrictly only valid for scenes with one dominant, pointlike lightsource, and it does not model potentially complex effects arisingfrom interreflections. Second, to estimate the lit and shadow im-ages, we require a static camera. Third, in order to construct theshadow displacement map, we require the source background to beplanar and the target background to contain a planar reference re-gion. Finally, we require that the relationship of the dominant lightsource, reference plane, and camera be matched in the source andtarget scenes.

Despite these restrictions, we believe that in many settings ourapproach provides a less restrictive capture mechanism comparedto previous shadow extraction and compositing methods. Existingtechniques typically share our requirement for a single matchedkey light source and matched cameras in order to avoid the diffi-cult “foreground relighting” problem. However, they also requirethe construction and calibration of matching geometry (physical orvirtual sets) and the use of painstakingly lit bluescreens for sourcecapture. Our technique requires neither matching geometry nor bluescreens.

Furthermore, some of the theoretical restrictions on our techniquecan be relaxed in practice. For instance, the camera and light direc-tions need not match precisely. As shown in the previous section,approximate matches were enough to create convincing compos-ites. In addition, the dominant light sources in our scenes were notperfect point lights, but no objectionable artifacts were evident. Forless point-like sources, we can transfer approximate penumbrae aslong as the source and target backgrounds are at a similar distanceto the casting object. We could potentially even blur or sharpenthe shadow matte to create a faux shadow with a softer or harderpenumbra.

7 ConclusionsIn this paper, we have introduced a physically-based shadow mat-ting and compositing method. Our approach has many advantagesover previous approaches to shadow extraction and compositing.First, it can extract both the photometric and geometric informa-tion required for shadow compositing from natural (planar) scenes.Second, it can cast shadows onto scenes that already contain shad-ows, self-shadowing, and specularities. Third, it can cast shadowsonto complex geometry without manually modeling the scene (atthe price of having to do a shadow scan). Because we use the samecamera to capture our warping function as we do to capture the im-ages of the scene itself, we avoid the difficulties of having to accu-

(a) Source frame (b) Source lit image (c) Source shadow image (d) Recovered β

Figure 3 Shadow matting. Starting from a source image sequence (one frame shown in (a)), we first remove the foreground character using video matting. Ourshadow matting algorithm recovers lit and shadow images (b,c) using max/min compositing. It then estimates β by projecting observed pixel colors onto the colorlines between them (d).

(a) Target lit image (b) Target shadow image (c) Composite without displacement (d) Geometric shadow scanning

(e) Regular grid on the reference plane (f) Displacement map visualization (g) Composite with displacement (h) Using another background

Figure 4 Shadow compositing. Lit and shadow images are recovered for target geometry as well (a,b). Our composite remains unconvincing because the shadowdoes not conform to the background geometry (c). We acquire multiple, oriented scans of a straight line shadow (d), and compute line equations in the user-specifiedreference plane region, shown here with hatched blue lines. We then recover a displacement map (f) from the target scene to the source reference plane (e). Thismap distorts the shadow into the correct shape (g). The results using a second background are shown in (h).

(a) Source frame (b) Target frame (c) Composite

Figure 5 An example of more complicated background geometry. The foreground element and shadow are extracted from a source frame (a) and transferred to atarget frame (b) containing ivy to obtain a composite (c).

(a) Source frame (b) Target frame (c) Composite

Figure 6 Honey, I Shrunk the Grad Student! A source (a) and target (b) for a miniature composite (c).

rately register an independently reconstructed model to our image.

There are a number of ways in which our current approach can beextended. Due to sensor noise, min/max compositing can introduceartifacts, particularly in areas that are always in shadow in the targetscene. In these areas, the brightness will vary with β in a perceptu-ally objectionable way, since the underlying color is dark. Statisticalmethods could be employed to better estimate the mean shadow andlit colors.

We would like to relax the planarity constraints for the source back-ground. For a source scene that is not fully planar, but has at least aplanar segment, we could simply shadow scan the source scene toget its displacement map relative to its own ground plane, and usethis to produce an unwarped (planar) shadow matte.

It would also be useful to have interactive editing tools for adjustingshadow direction and shape. Such tools could extend the operatingrange of our technique to cases in which lights or reference planesare somewhat misaligned between the source and target scene.

Finally, we would like to extend the operating range of the shadowmatting and compositing equations, and preliminary experimentsdo suggest that at least some extension is possible. For instance,assuming the source and target backgrounds are Lambertian andgeometrically similar, our method could still matte and compositeplausible shadows cast by multiple light sources without taking sep-arate images for each light source. In addition, our shadow densitymattes are currently single-channel mattes bounded between 0 and1. By using unbounded, multichannel β values, our method couldalso be modified to transfer approximate color-filtered shadows andcaustics.

AcknowledgmentsThe authors would like to thank Benjamin Stewart for his acting.This work was supported by the University of Washington Anima-tion Research Labs, NSF grant CCR-987365, and an industrial giftfrom Microsoft.

ReferencesBERTALMIO, M., SAPIRO, G., CASELLES, V., AND BALLESTER, C.

2000. Image inpainting. In Proceedings of SIGGRAPH 2000, 417–424.

BOUGUET, J.-Y., AND PERONA, P. 1998. 3D photography on your desk.In Proceedings of IEEE International Conference on Computer Vision(ICCV 98), 43–50.

CHUANG, Y.-Y., ZONGKER, D. E., HINDORFF, J., CURLESS, B.,SALESIN, D. H., AND SZELISKI, R. 2000. Environment matting exten-sions: Towards higher accuracy and real-time capture. In Proceedings ofACM SIGGRAPH 2000, 121–130.

CHUANG, Y.-Y., CURLESS, B., SALESIN, D. H., AND SZELISKI, R.2001. A Bayesian approach to digital matting. In Proceedings of Com-puter Vision and Pattern Recognition (CVPR 2001), vol. II, 264–271.

CHUANG, Y.-Y., AGARWALA, A., CURLESS, B., SALESIN, D. H., AND

SZELISKI, R. 2002. Video matting of complex scenes. ACM Transac-tions on Graphics 21, 3, 243–248.

DEBEVEC, P. 1998. Rendering synthetic objects into real scenes: bridgingtraditional and image-based graphics with global illumination and highdynamic range photography. In Proceedings of SIGGRAPH 98, 189–198.

FINLAYSON, G. D., HORDLEY, S. D., AND DREW, M. S. 2002. Remov-ing shadows from images. In Proceedings of European Conference onComputer Vision (ECCV 2002), vol. 2353 of LNCS, 823–836.

KUMAR, R., ANANDAN, P., AND HANNA, K. 1994. Direct recovery ofshape from multiple views: A parallax based approach. In Twelfth Inter-national Conference on Pattern Recognition (ICPR’94), 685–688.

LORENSEN, W. E., AND CLINE, H. E. 1987. Marching cubes: A highresolution 3D surface construction algorithm. In Computer Graphics(Proceedings of ACM SIGGRAPH ‘90), 163–169.

MATSUSHITA, Y., KANG, S. B., LIN, S., SHUM, H.-Y., AND TONG, X.2002. Lighting interpolation by shadow morphing using intrinsic lumi-graphs. In Proceedings of Pacific Graphics 2002, 58–65.

PELLACINI, F., TOLE, P., AND GREENBERG, D. P. 2002. A user interfacefor interactive cinematic shadow design. ACM Transactions on Graphics21, 3, 563–566.

PERONA, P., AND MALIK, J. 1990. Scale space and edge detection usinganisotropic diffusion. IEEE Trans. on Pattern Analysis and MachineIntelligence 12, 7 (July), 629–639.

PETROVIC, L., FUJITO, B., WILLIAMS, L., AND FINKELSTEIN, A. 2000.Shadows for cel animation. In Siggraph 2000, Computer Graphics Pro-ceedings, K. Akeley, Ed., Annual Conference Series, 511–516.

PORTER, T., AND DUFF, T. 1984. Compositing digital images. In Com-puter Graphics (Proceedings of ACM SIGGRAPH ‘84), 253–259.

RUZON, M. A., AND TOMASI, C. 2000. Alpha estimation in naturalimages. In Proceedings of Computer Vision and Pattern Recognition(CVPR 2000), 18–25.

SAVARESE, S., RUSHMEIER, H., BERNARDINI, F., AND PERONA, P.2001. Shadow carving. In Proceedings of IEEE International Confer-ence on Computer Vision (ICCV 2001), 190–197.

SMITH, A. R., AND BLINN, J. F. 1996. Blue screen matting. In Proceed-ings of ACM SIGGRAPH 96, 259–268.

SZELISKI, R., AVIDAN, S., AND ANANDAN, P. 2000. Layer extrac-tion from multiple images containing reflections and transparency. InProceedings of Computer Vision and Pattern Recognition (CVPR 2000),246–253.

WEISS, Y. 2001. Deriving intrinsic images from image sequences. In Pro-ceedings of IEEE International Conference on Computer Vision (ICCV2001), 68–75.

WRIGHT, S. 2001. Digital Compositing for Film and Video. Focal Press.

ZONGKER, D. E., WERNER, D. M., CURLESS, B., AND SALESIN, D. H.1999. Environment matting and compositing. In Proceedings of ACMSIGGRAPH 99, 205–214.

Shadow Matting and Compositing - GRAIL: UW Graphics and...

Documents

Transcript of Shadow Matting and Compositing - GRAIL: UW Graphics and...