Bilateral Filtering

download Bilateral Filtering

If you can't read please download the document

Transcript of Bilateral Filtering

Foundations and TrendsR inComputer Graphics and VisionVol. 4, No. 1 (2008) 173c 2009 S. Paris, P. Kornprobst, J. Tumblin andF. DurandDOI: 10.1561/0600000020Bilateral Filtering: Theory and ApplicationsBy Sylvain Paris, Pierre Kornprobst, Jack Tumblinand Fredo DurandContents1 Introduction 22 From Gaussian Convolution to Bilateral Filtering 42.1 Terminology and Notation 42.2 Image Smoothing with Gaussian Convolution 52.3 Edge-preserving Filtering with the Bilateral Filter 63 Applications 113.1 Denoising 123.2 Contrast Management 163.3 Depth Reconstruction 223.4 Data Fusion 223.5 3D Fairing 253.6 Other Applications 284 Ecient Implementation 334.1 Brute Force 334.2 Separable Kernel 344.3 Local Histograms 354.4 Layered Approximation 364.5 Bilateral Grid 374.6 Bilateral Pyramid 404.7 Discussion 435 Relationship between Bilateral Filtering and OtherMethods or Framework 445.1 Bilateral Filtering is Equivalent to Local Mode Filtering 445.2 The Bilateral Filter is a Robust Filter 475.3 Bilateral Filtering is Equivalent Asymptotically to thePerona and Malik Equation 516 Extensions of Bilateral Filtering 576.1 Accounting for the Local Slope 576.2 Using Several Images 627 Conclusions 65Acknowledgments 67References 68Foundations and TrendsR inComputer Graphics and VisionVol. 4, No. 1 (2008) 173c 2009 S. Paris, P. Kornprobst, J. Tumblin andF. DurandDOI: 10.1561/0600000020Bilateral Filtering: Theory and ApplicationsSylvain Paris1, Pierre Kornprobst2,Jack Tumblin3and Fredo Durand41Adobe Systems, Inc., CA 95110-2704, USA, [email protected] Project Team INRIA, ENS Paris, UNSA LJAD, France,[email protected] of Electrical Engineering and Computer Science,Northwestern University, IL 60208, USA, [email protected] Science and Articial Intelligence Laboratory, MassachusettsInstitute of Technology, MA 02139, USA, [email protected] bilateral lter is a non-linear technique that can blur an imagewhile respecting strong edges. Its ability to decompose an image intodierent scales without causing haloes after modication has made itubiquitous in computational photography applications such as tonemapping, style transfer, relighting, and denoising. This text providesa graphical, intuitive introduction to bilateral ltering, a practicalguide for ecient implementation and an overview of its numerousapplications, as well as mathematical analysis.1IntroductionBilateral ltering is a technique to smooth images while preservingedges. It can be traced back to 1995 with the work of Aurich andWeule [4] on nonlinear Gaussian lters. It was later rediscovered bySmith and Brady [59] as part of their SUSAN framework, and Tomasiand Manduchi [63] who gave it its current name. Since then, the useof bilateral ltering has grown rapidly and is now ubiquitous in image-processing applications Figure 1.1. It has been used in various contextssuch as denoising [1, 10, 41], texture editing and relighting [48], tonemanagement [5, 10, 21, 22, 24, 53], demosaicking [56], stylization [72],and optical-ow estimation [57, 74]. The bilateral lter has several qual-ities that explain its success: Its formulation is simple: each pixel is replaced by a weightedaverage of its neighbors. This aspect is important because itmakes it easy to acquire intuition about its behavior, to adaptit to application-specic requirements, and to implement it. It depends only on two parameters that indicate the size andcontrast of the features to preserve. It can be used in a non-iterative manner. This makes theparameters easy to set since their eect is not cumulativeover several iterations.23(a) Input image (b) Output of the bilateral filterFig. 1.1 The bilateral lter converts any input image (a)to a smoothed version (b). Itremoves most texture, noise, and ne details, but preserves large sharp edges withoutblurring. It can be computed at interactive speed even on large images,thanks to ecient numerical schemes [21, 23, 55, 54, 50, 71],and even in real time if graphics hardware is available [16].In parallel to applications, a wealth of theoretical studies [6, 7, 13,21, 23, 46, 50, 60, 65, 66] explain and characterize the bilateral ltersbehavior. The strengths and limitations of bilateral ltering are nowfairly well understood. As a consequence, several extensions have beenproposed [14, 19, 23].This paper is organized as follows. Section 2 presents linearGaussian ltering and the nonlinear extension to the bilateral lter.Section 3 revisits several recent, novel and challenging applicationsof bilateral ltering. Section 4 compares dierent ways to implementthe bilateral lter eciently. Section 5 presents several links of bilat-eral ltering with other frameworks and also dierent ways to inter-pret it. Section 6 exposes extensions and variants of the bilaterallter. We also provide a website with code and relevant pointers(http://people.csail.mit.edu/sparis/bf survey/).2From Gaussian Convolution to Bilateral FilteringTo introduce bilateral ltering, we begin with a description of Gaussianconvolution in Section 2.2. This lter is simpler, introduces the notionof local averaging, and is closely related to the bilateral lter but doesnot preserve edges. Section 2.3 then underscores the specic featuresof the bilateral lter that combine smoothing with edge preservation.First, we introduce the notation used throughout this paper.2.1 Terminology and NotationFor simplicity, most of the exposition describes ltering for a gray-level image I although every ltering operation can be duplicated foreach component of a color image unless otherwise specied. We use thenotation Ip for the image value at pixel position p. Pixel size is assumedto be 1. F[I] designates the output of a lter F applied to the image I.We will consider the set o of all possible image locations that we namethe spatial domain, and the set ! of all possible pixel values that wename the range domain. For instance, the notation pS denotes asum over all image pixels indexed by p. We use [ [ for the absolutevalue and [[ [[ for the L2 norm, e.g., [[p q[[ is the Euclidean distancebetween pixel locations p and q.42.2 Image Smoothing with Gaussian Convolution 52.2 Image Smoothing with Gaussian ConvolutionBlurring is perhaps the simplest way to smooth an image; each out-put image pixel value is a weighted sum of its neighbors in the inputimage. The core component is the convolution by a kernel which is thebasic operation in linear shift-invariant image ltering. At each outputpixel position it estimates the local average of intensities, and corre-sponds to low-pass ltering. An image ltered by Gaussian Convolutionis given by:GC[I]p =

qSG([[p q[[) Iq, (1)where G(x) denotes the 2D Gaussian kernel (see Figure 2.1):G(x) = 122 exp_ x222_. (2)Gaussian ltering is a weighted average of the intensity of theadjacent positions with a weight decreasing with the spatial distance tothe center position p. The weight for pixel q is dened by the GaussianG([[p q[[), where is a parameter dening the neighborhood size.The strength of this inuence depends only on the spatial distancebetween the pixels and not their values. For instance, a bright pixel hasa strong inuence over an adjacent dark pixel although these two pixelvalues are quite dierent. As a result, image edges are blurred becausepixels across discontinuities are averaged together (see Figure 2.1).The action of the Gaussian convolution is independent of the imagecontent. The inuence that a pixel has on another one depends onlytheir distance in the image, not on the actual image values.Remark. Linear shift-invariant lters such as Gaussian convolution(Equation (1)) can be implemented eciently even for very large using the Fast Fourier Transform (FFT) and other methods, but theseacceleration techniques do not apply to the bilateral lter or othernonlinear or shift-variant lters. Fortunately, several fast numericalschemes were recently developed specically for the bilateral lter (seeSection 4).6 From Gaussian Convolution to Bilateral Filtering 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-60 -40 -20 0 20 40 60 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-60 -40 -20 0 20 40 60 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-60 -40 -20 0 20 40 60 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-60 -40 -20 0 20 40 60Fig. 2.1 Example of Gaussian linear ltering with dierent . Top row shows the prole of a1D Gaussian kernel and bottom row the result obtained by the corresponding 2D Gaussianconvolution ltering. Edges are lost with high values of because averaging is performedover a much larger area.2.3 Edge-preserving Filtering with the Bilateral FilterThe bilateral lter is also dened as a weighted average of nearby pixels,in a manner very similar to Gaussian convolution. The dierence isthat the bilateral lter takes into account the dierence in value withthe neighbors to preserve edges while smoothing. The key idea of thebilateral lter is that for a pixel to inuence another pixel, it shouldnot only occupy a nearby location but also have a similar value.The formalization of this idea goes back in the literature toYaroslavsky [77], Aurich and Weule [4], Smith and Brady [59] andTomasi and Manduchi [63]. The bilateral lter, denoted by BF[ ], isdened by:BF[I]p = 1Wp

qSGs([[p q[[) Gr([Ip Iq[) Iq, (3)where normalization factor Wp ensures pixel weights sum to 1.0:Wp =

qSGs([[p q[[) Gr([Ip Iq[). (4)Parameters s and r will specify the amount of ltering for the imageI. Equation (3) is a normalized weighted average where Gs is a spatial2.3 Edge-preserving Filtering with the Bilateral Filter 7InputSpatial weight Range weightResultMultiplication of rangeand spatial weightsBilateral filter weights at the central pixel}Fig. 2.2 The bilateral lter smooths an input image while preserving its edges. Each pixelis replaced by a weighted average of its neighbors. Each neighbor is weighted by a spatialcomponent that penalizes distant pixels and range component that penalizes pixels with adierent intensity. The combination of both components ensures that only nearby similarpixels contribute to the nal result. The weights shown apply to the central pixel (underthe arrow). The gure is reproduced from [21].Gaussian weighting that decreases the inuence of distant pixels, Gris a range Gaussian that decreases the inuence of pixels q when theirintensity values dier from Ip. Figure 1.1 shows a sample output of thebilateral lter and Figure 2.2 illustrates how the weights are computedfor one pixel near an edge.2.3.1 ParametersThe bilateral lter is controlled by two parameters: s and r. Figure 2.3illustrates their eect. As the range parameter r increases, the bilateral lter gradu-ally approximates Gaussian convolution more closely becausethe range Gaussian Gr widens and attens, i.e., is nearlyconstant over the intensity interval of the image. Increasing the spatial parameter s smooths larger features.8 From Gaussian Convolution to Bilateral FilteringFig. 2.3 The bilateral lters range and spatial parameters provide more versatile controlthan Gaussian convolution. As soon as either of the bilateral lter weights reaches valuesnear zero, no smoothing occurs. As a consequence, increasing the spatial sigma will not bluran edge as long as the range sigma is smaller than the edge amplitude. For example, notethe rooftop contours are sharp for small and moderate range settings r, and that sharpnessis independent of the spatial setting s. The original image intensity values span [0, 1].In practice, in the context of denoising, Liu et al. [41] show that adapt-ing the range parameter r to estimates of the local noise level yieldsmore satisfying results. The authors recommend a linear dependence:r = 1.95 n, where n is the local noise level estimate.An important characteristic of bilateral ltering is that the weightsare multiplied: if either of the weights is close to zero, no smoothingoccurs. As an example, a large spatial Gaussian coupled with narrowrange Gaussian achieves limited smoothing despite the large spatialextent. The range weight enforces a strict preservation of the contours.2.3.2 Computational costAt this stage of the presentation, skeptical readers may have alreadydecided that the bilateral lter is an unreasonably expensive algorithmto compute when the spatial parameter s is large, as it constructseach output pixel from a large neighborhood, requires the calculationof two weights, their products, and a costly normalizing step as well.2.3 Edge-preserving Filtering with the Bilateral Filter 9In Section 4 we will show some ecient approaches to implement thebilateral lter.2.3.3 IterationsThe bilateral lter can be iterated. This leads to results that arealmost piecewise constant as shown in Figure 2.4. Although this yieldssmoother images, the eect is dierent from increasing the spatialand range parameters. As shown in Figure 2.3, increasing the spatialparameters s has a limited eect unless the range parameter r is alsoincreased. Although a large r also produces smooth outputs, it tendsto blur the edges whereas iterating preserves the strong edges such asthe border of the roof in Figure 2.4 while removing the weaker detailssuch as the tiles. This type of eect is desirable for applications suchas stylization [72] that seek to abstract away the small details, whilecomputational photography techniques [5, 10, 21] tend to use a singleiteration to be closer to the initial image content.2.3.4 SeparationThe bilateral lter can split an image into two parts: the ltered imageand its residual image. The ltered image holds only the large-scalefeatures, as the bilateral lter smoothed away local variations withoutaecting strong edges. The residual image, made by subtracting theltered image from the original, holds only the image portions thatthe lter removed. Depending on the settings and the application,Fig. 2.4 Iterations: the bilateral lter can be applied iteratively, and the result progressivelyapproximates a piecewise constant signal. This eect can help achieve a limited-palette,cartoon-like rendition of images [72]. Here, s = 8 and r = 0.1.10 From Gaussian Convolution to Bilateral FilteringFig. 2.5 Separation: The residual image holds all input components (a) removed by thebilateral lter (b), and some image structure is visible here (c). For denoising tasks, theideal residual image would contain only noise, but here the r setting was large enoughto remove some ne textures that are nearly indistinguishable from noise, and still yieldsacceptable results for many denoising tasks.this removed small-scale component can be interpreted as noise ortexture, as shown in Figure 2.5. Applications such as tone managementand style transfer extend this decomposition to multiple layers (seeSection 3). To conclude, bilateral ltering is an eective way to smoothan image while preserving its discontinuities (see Sections 3.1 and3.5) and also to separate image structures of dierent scales (seeSection 3.2). As we will see, the bilateral lter has many applications,and its central notion of assigning weights that depend on both spaceand intensity can be tailored to t a diverse set of applications (seeSection 6).Remark. The reader may know that the goal of edge-preservingimage restoration has been addressed for many years by partial dier-ential equations (PDEs), and one may wonder about their relationshipwith bilateral lters. Section 5.1 will explore those connections in detail.3ApplicationsThis section discusses the uses of the bilateral lter for a variety ofapplications: Denoising (Section 3.1): This is the original, primary goal ofthe bilateral lter, where it found broad applications thatinclude medical imaging, tracking, movie restoration, andmore. We discuss a few of these, and present a useful exten-sion known as the cross-bilateral lter. Texture and Illumination Separation, Tone Mapping,Retinex, and Tone Management (Section 3.2): Bilateral l-tering an image at several dierent settings decomposesthat image into large-scale/small-scale textures and features.These applications edit each component separately to adjustthe tonal distribution, achieve photographic stylization, ormatch the adjusted image to the capacities of a displaydevice. Data Fusion (Section 3.4): These applications use bilateralltering to decompose several source images into compo-nents and then recombine them as a single output image thatinherits selected visual properties from each of the sourceimages.1112 Applications 3D Fairing (Section 3.5): In this counterpart to image denois-ing, bilateral ltering applied to 3D meshes and point cloudssmooths away noise in large areas and yet keeps all corners,seams, and edges sharp. Other Applications (Section 3.6): New applications areemerging steadily in the literature; we highlight several newtrends indicated by recently published papers.3.1 DenoisingOne of the rst roles of bilateral ltering was image denoising.Later, the bilateral lter became popular in the computer graphicscommunity because it is edge preserving, easy to understand and setup, and because ecient implementations were recently proposed (seeSection 4).The bilateral lter has become a standard interactive tool for imagedenoising. For example, Adobe PhotoshopR provides a fast and sim-ple bilateral lter variant under the name surface blur (Figure 3.1).Instead of Gaussian functions, it uses a square box function as its spa-tial weight, and a tent (linear) function as the range weight. UnlikeGaussian convolution that smooths images without respecting theirvisual structures, the bilateral lter preserves the object contours andproduces sharp results. The surface blur tool is often used by portraitphotographers to smooth skin while preserving sharp edges and detailsin the subjects eyes and mouth.Fig. 3.1 Denoising using the surface blur lter from Adobe Photoshop R : We addednoise (b) to the input image (a) and applied the surface blur lter. As the input imagewas corrupted by noise, some signal loss is inevitable, but the ltered version is signicantlyimproved.3.1 Denoising 13Qualitatively, the bilateral lter represents an easy way to decom-pose an image into a cartoon-like component and a texture one. Thiscartoon-like image is the denoised image which can be used in severalapplications as shown in this section. Qualitatively, such a decomposi-tion could be obtained by any simplifying lter. But this decompositionis not trivial from a mathematical perspective if one considers the math-ematical structure of images. In this respect, we refer to Meyer [44],Vese and Osher [67], Aujol et al., [3] for more details about approachesdedicated to precise texture-cartoon decompositions.The cartoon-like eect can also be a drawback depending on theapplication. Buades et al. [14] have shown that although bilateral lter-ing preserves edges, the preservation is not perfect and some edges aresharpened during process, introducing an undesirable staircase eect.We discuss this eect in more detail in Section 6.1.3. In summary, thebilateral lter can be the right approach for many applications, but itis not always the best solution nor the best denoising lter available.As a nal comment, the bilateral lter is related to severalapproaches and frameworks proposed in the literature. We revisit themost important ones in Section 5. These analogies are interesting tonotice, as they give theoretical foundations to bilateral ltering andshow alternative formulations.3.1.1 Medical ImageryIn the domain of medical imagery, Wong et al. [73] improved the struc-ture preservation abilities of the bilateral lter by explicitly describingthe structure with an additional weight, one that depends on the localshape and orientation of the sensed image data.3.1.2 VideosBennett and McMillan [10] show that bilateral ltering can be used forvideos. In this context, the bilateral lter is applied along the time axis,that is, pixels at the same location in successive frames are averagedtogether. The fact that the bilateral lter does not average togetherpixels of dierent colors prevents mixing data from dierent objectsthat appear at the same location but at dierent times. For instance,14 Applications(a) Input (b) Naive histogramstretching(c) Output of Bennett andMcMillan [9]Fig. 3.2 Bennett and McMillan [10] describe how to combine spatial and temporal bilaterallterings to achieve high-quality video denoising and exposure correction. Figure reproducedfrom Bennett and McMillan [10].if a red ball passes in front of green tree, the ball and tree pixels arenot mixed together, thanks to the range weight of the bilateral lter.However, pixels that change color often, for instance due to a rapidlymoving object, may not have any similar neighbors along the time axis.Bennett and McMillan compensate for this case by looking for spatialneighbors when there are not enough temporal similar pixels. Figure 3.2shows sample results.3.1.3 Orientation SmoothingParis et al. [49] use the bilateral lter to smooth the 2D orientationeld computed from optical measurements for hairstyle modeling. Theirmeasuring scheme yields a per-pixel evaluation of the local orientation,but these measures are noisy and at times ambiguous due to the com-plex nature of hair images. Paris et al. evaluated the success of theirmeasurements at pixel p using the variance at Vp and incorporated itinto their lter. In Paris setup, several illumination conditions oerorientation estimates for each pixel, and they use the maximum dif-ference among all these estimates. As the orientation angle variescyclically between 0 and , they map their averaging onto a complexexponential: [0, [ exp_2i_ C, leading to the lter:exp(2i FParis()p)=

qGs([[p q[[) GV (Vp/Vq) G_(p, q)_ exp(2i q) (5)3.1 Denoising 15(a) Zoom on input image (b) Orientations before bilateral filtering (c) Orientations after bilateral filteringFig. 3.3 Paris et al. [49] smooth their orientation measurements using a variant of bilateralltering mapped to the complex plane C. Figure reproduced from Paris et al. [49].This lter acts upon orientation mapped to the complex plane.Although Paris application needs only the phase argument of theresult and discards the amplitude , if needed it could act as the stan-dard deviation in the scalar case Watson, [70]. This lter illustrateshow bilateral ltering can adapt to incorporate application-specicknowledge.3.1.4 Discussion and Practical ConsiderationDenoising usually relies on small spatial kernels s and the range sigmar is usually chosen to match the noise level.The bilateral lter might not be the most advanced denoising tech-nique but its strength lies in its simplicity and exibility. The weightscan be adjusted to take into account any metric on the dierencebetween two pixels and information about the reliability of a givenpixel can be included by reducing the weights assigned to it.In the case of salt-and-pepper or impulse noise, the bilateral ltermay need to mollify the input image before use. Though the noise maybe sparse, the aected pixels intensity values may span the entire imagerange (e.g., [01]), and their values might be too dierent from theirneighbors to be ltered out. To mollify these images, compute the rangeGaussian weights on a median-ltered version of the image [21]. If Mdescribes median ltering, this gives:BF[I]p = 1Wp

qSGs([[p q[[) Gr([M[I]p M[I]q[) Iq, (6)16 ApplicationsWp =

qSGs([[p q[[) Gr([M[I]p M[I]q[). (7)This practice is commonplace in robust statistics: users apply avery robust estimator such as the median lter rst to obtain a suitableinitial estimate, then apply a more precise estimator (the bilateral lter)to nd the nal result.3.2 Contrast ManagementBilateral ltering has been particularly successful as a tool for con-trast management tasks such as detail enhancement or reduction.Oh et al. [48] describe how to use the bilateral lter to separate animage into a large-scale component and a small-scale component bysubtracting ltered results. With this decomposition, they edit texturein a photograph. Several earlier nonlinear coarse/ne decompositionswere already in use in various local tone mapping operators (e.g., Stock-ham [62], Chiu et al. [17], Schlick [58], Pattanaik et al. [51], Tumblinand Turk [64]) but Durand and Dorsey [21] were the rst to applythe method using the bilateral lter. Elad [24] followed the same strat-egy to estimate the illumination and albedo of the photographed scene.Bae et al. [5] extended this approach to manipulate the look of a photo-graph, and Fattal et al. [25] describe a multi-scale image decompositionthat preserves edges and allows for combining multiple images to revealobject details. We describe these applications in the next sections.3.2.1 Texture and Illumination SeparationIn the context of image-based modeling, Oh et al. [48] used thestructure-removal aspect of the bilateral lter. By using a sucientlylarge range parameter r, the bilateral lter successfully removes thevariations due to reectance texture while preserving larger disconti-nuities stemming from illumination changes and geometry. Their tech-nique is motivated by the fact that illumination variations typicallyoccur at a larger scale than texture patterns, as observed by Land inhis Retinex theory of lightness perception [39, 38]. To extract the illu-mination component, they derive a variant of the iterated bilateral lter3.2 Contrast Management 17for which the initial image is always ltered. The successive estimatesare used only to rene the range weight:BFi+1[I]p = 1Wp

qSGs([[p q[[) Gr_BFi[I]p BFi[I]q_Iq,with BF0[I] = I.In addition, because a depth estimate is available at each image pixel,they adapt the spatial Gaussian size and shape to account for depthforeshortening. At each pixel they estimate a tangent plane to the localgeometry, and choose an oriented spatial Gaussian that is isotropicin this tangent plane, which results in an anisotropic Gaussian onceprojected onto the image plane.3.2.2 Tone MappingDurand and Dorsey [21] show that the use of bilateral ltering can beextended to isolate small-scale signal variations including texture andalso small details of an image. They demonstrate this property to con-struct a tone mapping process whose goal is to compress the intensityvalues of an high-dynamic-range image to t the capabilities of a low-dynamic-range display. In accordance with earlier local tone mappingoperators, they note that naive solutions such as uniform scaling orgamma reductions to compress contrasts yield unsatisfactory resultsbecause the severe reductions needed for high contrast features causesubtle textures and scene details to vanish. While earlier tone map-ping operators used multi-scale lter banks, wavelets, nonlinearitiesmodeled on neural processes, and diusion PDEs to separate visuallycompressible and incompressible components of log luminance, Durandand Dorsey used the bilateral lter for a fast, much simpler and visuallypleasing result. They apply the bilateral lter on the log-intensities ofthe HDR image, scale down uniformly the result, and add back the l-ter residual, thereby ensuring that the small-scale details have not beencompressed during the process. Some earlier methods such as Pattanaiket al. [51] used weighted multi-scale decompositions that model psy-chophysical models of visual appearance or relied on user interactionto achieve best-looking results (e.g., Jobson et al. [32], Tumblin and18 ApplicationsFig. 3.4 Tone Mapping: Direct display of an HDR image (a) is not satisfying becauseover- and under-exposed areas hide image features. Contrast compression maps all sceneintensities to the display, but details in clouds and in the city below the horizon are barelyvisible (b). Isolating the details using Gaussian convolution brings back the details, butincurs halos near contrasted edges (e.g., near the tree silhouettes) (c). Durand and Dorseyuse the bilateral lter to isolate the small variations of the input image without incurringhalos (d). Figure reproduced from Durnad and Dorsey [21].Turk [64]), but as shown in Figure 3.4 HDR images tone-mapped withDurand and Dorseys technique are less dicult to achieve yet maintaina plausible, visually pleasing appearance.3.2.3 RetinexElad [24] proposes a dierent interpretation of the tone mapping tech-nique of Durand and Dorsey using the Retinex theory of Edwin Landthat seeks a separation of images into illumination and albedo. Underthe assumption that scene objects are mostly-diuse reectors thatdo not emit light, illumination values are greater than the measuredintensities because objects always absorb part of the incoming light.3.2 Contrast Management 19Elad adapts the bilateral lter to ensure that the ltered result fulllsthis requirement and forms an upper envelope of the image intensities.He replaces the range weight Gr by a truncated Gaussian H Gr,where H is a step function whose value is 1 for non-negative inputs andis 0 otherwise. As a consequence, at any given pixel p, the local aver-aging includes only values greater than intensity at p and guaranteesa ltered value at or above the local image intensity.3.2.4 Tone ManagementBae et al. [5] build upon the separation between the large scale and thesmall scale oered by the bilateral lter, and describe a technique totransfer the visual look of an artists picture onto a casual photograph.They explored a larger space of image modications by applying anarbitrary, user-specied monotonic transfer function to the large-scalecomponent of the source image. With histogram matching, they con-struct a transfer function that matches the global contrast and bright-ness of the model photograph. They also show that the small-scalecomponent can be modied to vary the amount of texture visible in theimage. To this end, they introduce the notion of textureness to quan-tify the local texture amplication they wish to induce in an image bycross-bilateral ltering (cf. Section 3.4.1 for detail). With the small-scale components (or high frequencies) H of the images log-intensitylogI, the textureness is dened by:1Wp

qSGs([[p q[[) Gr([logIp logIq[)[H[q. (8)Said another way, textureness is the amplitude of the high frequen-cies that were locally averaged while respecting the edges of the inputimage.Later, Chen et al. [16] sped up the bilateral lter computation usinggraphics hardware and achieved real-time results on high-denitionvideos, thereby enabling on-the-y control of the photographic style.3.2.5 Detail EnhancementFattal et al. [25] extend the small-scale/large-scale decomposition tomultiple layers to allow for ner control and selection of enhanced20 Applications(a) Input (b) Result after contrast and textureness increaseFig. 3.5 Bae et al. [5] use the bilateral lter to separate the large-scale and small-scalevariations of an image, and then processes them separately. In this example, users choseto increase the global image contrast and increase the texture as well for a more dramaticimage result. Figure reproduced from Bae et al. [5].(a) Sample input images (b) Output with enhanced detailsFig. 3.6 Fattal et al. [25] use the bilateral lter to create a multi-scale decomposition ofimages. They rst decompose several images of the same scene under dierent lightingconditions (a) and construct a new pyramid that generates a new image with enhanceddetails (b). Figure reproduced from Fattal et al. [25].details. They use their decomposition on several images taken from thesame point of view but under dierent lighting conditions and demon-strate a variety of eects by combining portions of bilateral imagepyramids obtained from these lighting variations. They describe howthese combinations can be controlled to reveal the desired details whileavoiding the halo artifacts (Figure 3.6). They also describe a numer-ical scheme to eciently compute image pyramids using the bilaterallter.3.2 Contrast Management 213.2.6 High-Dynamic-Range HallucinationWang et al. [69] use a bilateral lter decomposition to allow users togenerate a high-dynamic-range image from a single low-dynamic-rangeone. They seek to reconstruct data in over- and under-exposed areas ofthe image. They use the bilateral lter to create a decomposition intotexture and illumination inspired by Oh et al.s [48] work. This allowsthem to apply user-guided texture synthesis to the detail (texture)layer, after bilateral ltering removed the large-scale illuminationvariations. Similarly, they can apply smooth interpolation to the largescale (illumination) layer because high-frequency texture has beendecoupled.3.2.7 Discussion and Practical ConsiderationsContrast management relies on large spatial kernels to create large-scale/small-scale decompositions, because the small scale needs toinclude high- and medium-frequency components. The human visualsystem is not very sensitive to low frequencies but is quite sensitive tomedium frequencies. As the large-scale component is typically the onethat gets its contrast reduced, medium frequencies must be excludedfrom it to avoid attenuation as well.For contrast management, the bilateral lter is usually appliedto the log of the original image because the human visual systemsresponse to light is approximately multiplicative. Using the log domainmakes the range sigma act uniformly across dierent levels of intensity:edges where ltering should stop are dened in terms of multiplicativecontrast. Similarly, relighting applications deal with a multiplicativeprocess where illumination is multiplied by reectance. The use of thelog domain is not without its problems, as zero intensity maps to minusinnity and in dark regions noise in sensed intensity may be magni-ed in the log domain. Accordingly, many users add small constant onthe order of the noise level to the input intensities before taking thelog. The new color space proposed by Chong et al. [18] is particularlypromising to handle these and other multiplicative processes. Using theluminance channel of the CIE-Lab color space is another useful alter-native. Instead of a log curve, it is based on a cubic root that does not22 Applicationsmodel exactly these multiplicative processes but is numerically simplerto handle.3.3 Depth ReconstructionYang et al. [75, 76] and Yoon and Kweon [78] applied the bilaterallter to aid in stereo reconstruction, the recovery of depth values fromcorrespondences between pixels dierent views. Ideally we wish to nd acorresponding point in the right image for every pixel in the left image.As the distance between these point pairs, the disparity, is inverselyproportional to the depth at that pixel, this information is equivalentto recovering the scene geometry. To pair the pixels with points inthe other image, stereo algorithms typically compute a similarity scoresuch as color dierences or local correlation. Yang et al. and Yoonand Kweon show that locally aggregating these scores using bilateralweights signicantly improves the accuracy and reduces noise in therecovered depth maps. Y ang et al. [75] have tested many similarityscores and pairing strategies and found that the bilateral aggregationalways improves their results.3.4 Data Fusion3.4.1 Flash/No-ash ImagingEisemann and Durand [22] and Petschnigg et al. [53] describe simi-lar techniques to produce satisfying pictures in low-light conditions bycombining a ash and a no-ash photograph. Their work is motivatedby the fact that, although the ash image has unpleasantly direct andhard-looking lighting, its signal-to-noise ratio is higher than the no-ash image. On the other side, the no-ash image has more pleasingand natural-looking lighting, but its high frequencies are corrupted bynoise and the camera may require a longer exposure time and increasethe likelihood of blurring from an unsteady camera. The key idea isto extract the details of the ash image and combine them with thelarge-scale component of the no-ash picture. A variant of the bilaterallter performs this separation.3.4 Data Fusion 23(a) Sample input image (b) Coarse resolution computation (c) Refinement using bilateral aggregationFig. 3.7 Yang et al. [75] use the bilateral lter to achieve stereo reconstruction from pho-tographs (a). First, they build a coarse depth map (b) and then use a scheme inspired fromthe bilateral lter to aggregate local information and compute a rened, more accuratedepth map (c). Figure reproduced from Yang et al. [75].Both articles introduced the cross (joint) bilateral lter to betterprocess the no-ash photograph whose noise level is often too high toenable an accurate edge detection. As the ash image F represents thesame scene, it is used to dene the edges and the ltered no-ash imageis obtained as:CBF[N, F]p = 1Wp

qSGs([[p q[[) Gr([Fp Fq[) Nq, (9)where N is the original no-ash image. Figure 3.8 gives an overview ofthe process, and Figures 3.9 and 3.10 show sample results.3.4.2 Multispectral FusionBennett et al. [9] show how to exploit infrared data in addition tostandard RGB data to denoise low-light video streams. They use thedual bilateral lter, a variant of the bilateral lter with a modiedrange weight that accounts for both the visible spectrum (RGB) andthe infrared spectrum:DBF[RGB]p = 1Wp

qSGs([[p q[[) GRGB([[RGBp RGBq[[) GIR([IRp IRq[) RGBq, (10)where RGBp is a 3-vector representing the RGB component atpixel p, and IRp the measured infrared intensity at the same pixel p.24 ApplicationsBF BFFig. 3.8 Denoising of low-light images: Overview of the ash/no-ash combination ofEisemann and Durand [22]. The bilateral lter is used to combine the illumination com-ponent of the no-ash picture and the structure component of the ash picture. Figurereproduced from Eisemann and Durand [22].(a) Photograph with flash (b) Photograph without flash (c) CombinationFig. 3.9 By combining a ash photograph (a) and a no-ash photograph (b), Eisemann andDurand render a new photograph (c) that has both the warm lighting of the no-ash pictureand the crisp details of the ash image. Figure reproduced from Eisemann and Durand [22].3.5 3D Fairing 25(a) Flash picture (b) No-flash picture (c) Output of Petschnigg et al. [53]Fig. 3.10 By combining a ash photograph (a) and a no-ash photograph (b),Petschnigg et al. render a new photograph (c) that has both the warm lighting of theno-ash picture and the crisp details of the ash image. Figure reproduced from Petschnigget al. [53].Bennett et al. show that this combination better detects edges becauseit is sucient for an edge to appear in just one of the channels (RGBor infrared) to form a sharp boundary in the result. In combinationwith temporal ltering, they demonstrate that it is possible to obtainhigh-quality video streams from noisy sequences of moving objects shotin very low light.3.5 3D FairingJones et al. [34] extend bilateral ltering to meshes. The diculty com-pared to images is that all three xyz coordinates are subject to noise,data are not regularly sampled, and the z coordinate is not a functionof x and y unlike the pixel intensity. To smooth a mesh, Jones et al.assume that it is locally at. Under this assumption and in the absenceof noise, a vertex p belongs to the plane tangent to the mesh at anynearby vertex q. With q(p) the projection of p onto the plane tangentto the mesh at q, ideally we have p = q(p). However, because of noiseand because the mesh is not at everywhere, this relationship does nothold in general. To smooth the mesh, Jones et al. average the position ofp predicted by q(p), they apply a spatial weight Gs([[p q[[) whichensures that only nearby points contribute to the estimate. They adda term Gr([[p q(p)[[) that reduces the weights of outliers, i.e., thepredictions q(p) that are far away from the original position p. Using26 Applications(a) Input mesh (b) Smoothed meshFig. 3.11 Jones et al. [34] have adapted the bilateral lter to smooth 3D meshes whilepreserving their most prominent features. Figure reproduced from Jones et al. [34].a term aq to account for the sampling density, the resulting lter is:FJones(p) = 1Wp

qaq Gs([[p q[[) Gr([[p q(p)[[) q(p). (11)To improve the results, they mollify the mesh normals used to estimatethe tangent planes [30, 47], that is, they apply a low-pass lter onthe normals. This mollication is analogous to the pre-ltering stepdescribed by Catte et al. [15] for PDE lters. Figure 3.11 shows asample result.Fleishman et al. [26] simultaneously proposed a similar approach(Figure 3.12). The main dierence between the techniques ofJones et al. and Fleishman et al. [26] is the way Jones expresses hisat neighborhood assumption. Fleishman et al. use the mesh normalnp at p and project neighbors onto it. With q is such a neighbor, qshould project on p, that is: p + [(q p) np] np = p. This results inthe following variant of the bilateral lter:FFleishman(p)= p + npWp

qGs([[p q[[) Gr([(q p) np[)[(q p) np]. (12)The projection on the normal can be rewritten using the plane pro-jection operator used by Jones et al.: [(q p) np] np = q p(q).3.5 3D Fairing 27(a) Input (b) Output of Fleishman et al. [2003] Fig. 3.12 Fleishman et al. [34] have adapted the bilateral lter to smooth 3D meshes whilepreserving their most prominent features. Figure reproduced from Fleishman et al. [26].This leads to the following expression equivalent to Equation (12):FFleishman(p)= p + 1Wp

qGs([[p q[[) Gr([[q p(q)[[)_q p(q)_. (13)These two formulations underline the dierences between theapproaches of Jones et al. and Fleishman et al. Equation (12) showsthat, unlike Jones et al., Fleishman et al. guarantee no vertex driftby moving p only along its normal np. On the other hand, Fleish-man et al. do not compensate for the density variations described byJones et al. Furthermore, Equation (13) shows that the weights betweenboth approaches are similar except that Jones et al. project p on thetangent plane at q and thus exploit both the position and normal of allneighbors q, whereas Fleishman et al. project q on the tangent planeat p, thereby exploiting rst-order information only from the vertex p.This suggests a hybrid lter that we have not yet evaluated:Fhybrid(p)= p + 1Wp

qaqGs([[p q[[) Gr([[p q(p)[[)(q p(q)). (14)28 ApplicationsIn addition to these dierences in estimating the vertex positions,Fleishman et al. advocate iterating the lter three times for furthersmoothing of the mesh geometry. Wang [68] renes the processby explicitly detecting the sharp-edge vertices to preserve them.He remeshes the model at these edges to ensure that sharp featuresare correctly represented by an edge between two triangles.Later, Jones et al. [33] rened their technique to lter normals.Applying a geometric transformation f to the 3D space given byx R3F(x), Jones transforms the normals by the transposedinverse of the Jacobian of F. The Jacobian of F is a 3 3 matrixthat captures the rst-order deformation induced by F and is denedby Jij(F) = Fi/xj where Fi is the ith coordinate of F, and xj the jthcoordinate of x. Jones et al. show that iteratively transforming the nor-mals by JT(FJones) smooths the normals of a model while respectingits edges and without moving its vertices. They argue that not movingthe vertices yields a better preservation of the ne details of the meshes.Miropolsky and Fischer [45] propose a variant of bilateral lteringto smooth and decimate 3D point clouds. They assume that a normalnp is known for each point p. They overlay a regular 3D grid on topof the points and determine a representative point for each grid cellby taking into account the point location and normal. With c the cellcenter and nc the mean normal of the cell points, they propose:FMiropolsky(c) = 1Wp

qGs([[c q[[) Gr(nc nq)q (15)3.6 Other Applications3.6.1 Depth Map from LuminanceKhan et al. [35] use bilateral ltering to process the luminance chan-nel of an image and obtain a pseudo-depth map that is sucient foraltering the material appearance of the observed object. The original-ity of this use of the bilateral lter is that the smoothing power of thebilateral lter determines the geometric characteristics of an object.For instance, a smaller intensity tolerance r results in a depth mapthat looks engraved with the object texture, because the intensity3.6 Other Applications 29patterns are well preserved and directly transferred to the map as depthvariations.3.6.2 Video StylizationWinnem oller et al. [72] iterate the bilateral lter to simplify video con-tent and achieve a cartoon look (Figure 3.13). They demonstrate thatthe bilateral lter can be computed in real time at video resolutionusing the numerical scheme of Pham and van Vliet [55] on moderngraphics hardware. Later, Chen et al. [16] ported the bilateral lteron the GPU using the bilateral grid and achieved similar results onhigh-denition videos. Winnem oller et al. demonstrate that bilateralltering is an eective preprocessing for edge detection: ltered imagestrigger fewer spurious edges. To modulate the smoothing strength ofthe bilateral lter, they modify it to control the degree of edge preserva-tion. The range weight Gr is replaced by (1 m) Gr + m u wherem is a function varying between 0 and 1 to control edge preservation,and u denes the local importance of the image. To dene u and m,Winnem oller et al. suggest using an eye tracker [20], a computationalmodel of saliency [31], or a user-painted map.(a) Input (b) Abstracted outputFig. 3.13 Sample abstraction result from the method by Winnemoller et al. [72]. Reproducedfrom Winnemoller et al. [72].30 ApplicationsFig. 3.14 Bayer patterns are such that, although each pixel is missing two color channels,adjacent pixels have measures in these missing channels. Figure reproduced from Wikipedia(http://en.wikipedia.org/wiki/Bayer lter).3.6.3 DemosaickingDemosaicking is the process of recovering complete color informationfrom partial color sampling through a Bayer lter (see Figure 3.14).Ramanath and Snyder [56] interpolate missing color values of Bayerpatterns [8]. These patterns are used in digital cameras where eachsensor measures only a single value among red, green, and blue. Bayerpatterns are such that, although each pixel is missing two color chan-nels, adjacent pixels have measures in these missing channels. Demo-saicking is thus a small-scale interpolation problem where values areinterpolated from neighbor pixels. Directly interpolating the values3.6 Other Applications 31yields blurry images because edges are ignored. Ramanath and Snyderstart from such an image and rene the result with bilateral lter-ing. They use a small spatial neighborhood to consider only the pixelswithin the 1-ring of the ltered pixel, and also ensure that measuredvalues are not altered. The validation shows that the obtained resultscompare favorably to state-of-the-art techniques although the compu-tational cost is higher.3.6.4 Optical FlowXiao et al. [74] apply bilateral ltering to regularize the optical owcomputation. They use an iterative scheme to rene the ow vectorsbetween a pair of images. Each iteration consists of two steps: rst thevectors are adjusted using a scheme akin to Lucas and Kanade [42],then the ow vectors are smoothed using a modied version of bilat-eral ltering that has two additional terms, one accounting for owsimilarity, and one that ensures that occluded regions are ignored dur-ing averaging. This scheme also lls in occluded regions, estimatingdepth for pixels visible in one image of the pair but hidden in theother. These occluded points gather information from pixels outsidethe occluded region covered by the bilateral lter kernel, and the range(a) Upsampled result (b) Nearestneighbor(c) Bicubic (d) Gaussian (f) Groundtruth(e) JointbilateralFig. 3.15 Sample use of joint bilateral upsampling [37] to tone map a high-resolution HDRimage. In this context, the method is used to upsample the exposure map (a) applied to thepixel values to obtain the output (e) that is close to the ground-truth result (f) and doesnot exhibit the defects of other upsampling methods (bd). Figure reproduced from Kopfet al. [37].32 Applicationsweight ensures that only similar points contribute, thereby avoidingdata diused from the wrong side of the occlusion. An importantfeature of this technique is that it actually regularizes the computa-tion, i.e., the bilateral lter does not optimize a trade-o between adata term and smoothness term, it only makes the data smoother.Nonetheless, the process as a whole is a regularization because it inter-leaves bilateral ltering with an optimization step, and can be seen as aprogressive renement of the initial guess of a steepest-slope optimiza-tion. Sand and Teller [57] accelerate this technique by restricting theuse of bilateral ltering near the ow discontinuities.3.6.5 UpsamplingKopf et al. [37] describe joint bilateral upsampling, a method inspiredfrom the bilateral lter to upsample image data. The advantage of theirapproach is that it is generic and can potentially upsample any kind ofdata such as the exposure map used for tone mapping or hues for col-orization. Given a high-resolution image and a downsampled version,one can compute the data at low resolution and then upsample themusing a weighted average. High-resolution data are produced by aver-aging the samples in a 5 5 window at low resolution. The weightsare similar to those dened by the bilateral lter, as each neighbor-ing pixels inuence decreases with distance and color dierence. As aresult, Kopfs scheme interpolates low-resolution data while respectingthe discontinuities of the high-resolution input image. This lter is fastto evaluate because it only considers a small spatial footprint.4Ecient ImplementationA naive implementation of the bilateral lter can be extremely slow,especially for large spatial kernels. Several approaches have been pro-posed to speed up the computation. They all rely on approximationsthat yield various degrees of acceleration and accuracy. In this sec-tion, we describe these ecient algorithms and compare their perfor-mances. We begin with the brute force approach as reference. We thendescribe the techniques based on separable kernels of Pham [55] andPham and van Vliet [54], the local histogram of Weiss [71], and thebilateral grid [16, 50]. Figure 4.3 at the end of this section provides avisual comparison of the achieved results.4.1 Brute ForceA direct implementation of the bilateral lter consists of two nestedloops, as presented in Table 4.1.The complexity of this algorithm is O_[o[2_, where [o[ the size ofthe spatial domain (i.e., the number of pixels). This quadratic com-plexity quickly makes the computational cost explode for large images.3334 Ecient ImplementationTable 4.1 Algorithm for the direct implementation of bilateral lter.For each pixel p in o(1) Initialization: BF[I]p = 0,Wp = 0(2) For each pixel q in o(a) w =Gs([[p q[[) Gr([Ip Iq[)(b) BF[I]p+= wIq(c) Wp+= w(3) Normalization: BF[I]p = Ip/WpA classical improvement is to restrict the inner loop to the neigh-borhood of the pixel p. Typically, one considers only the pixels q suchthat [[p q[[ 2s. The rationale is that the contributions of pixelsfarther away than 2s is negligible because of the spatial Gaussian.This leads to a complexity on the order of O_[o[ s2_. This implemen-tation is ecient for small spatial kernels, that is, small values of s butbecome quickly prohibitive for large kernels because of the quadraticdependence in s.4.2 Separable KernelPham and van Vliet [55] propose to approximate the 2D bilateral lterby two 1D bilateral lters applied one after the other. First, they ltereach image column and then each row. Each time, they use the bruteforce algorithm restricted to a 1D domain, that is, the inner loop onpixels q is restricted to pixels on the same column (or row) as the pixelp. As a consequence, the complexity becomes O([o[ s) because theconsidered neighborhoods are 1D instead of 2D. This approach yieldssignicantly faster running times but the performance still degradeslinearly with the kernel size. Furthermore, this approach computesan axis-aligned separable approximation of the bilateral lter kernel.Although this approximation is satisfying for uniform areas and straightedges, it forms a poor match to more complex features such as textured4.3 Local Histograms 35regions. As a consequence, axis-aligned streaks may appear with largekernels in such regions (Figure 4.3). Pham [54] describes how to steerthe separation according to the local orientation in the image to reducethese streaks. This approach improves the quality of the results, espe-cially on slanted edges, but is computationally more involved becausethe 1D lters are no longer axis aligned.4.3 Local HistogramsWeiss [71] considers the case where the spatial weight is a square boxfunction, that is, he rewrites the bilateral lter as:BF[I]p = 1Wp

qNs(p)Gr([Ip Iq[)Iq (16a)Wp =

qNs(p)Gr([Ip Iq[), (16b)where As(p) = q, [[p q[[1 s. In this case, the result dependsonly on the histogram of the neighborhood As(p) because the actualposition of the pixel within the neighborhood is not taken into account.Following this remark, Weiss exposes an ecient algorithm to com-pute the histogram of the square neighborhoods of an image. We referto his article for the detail of the algorithm. The intuition behind hisapproach is that the neighborhoods As(p1) and As(p2) of two adja-cent pixels p1 and p2 largely overlap. Based on this remark, Weissdescribes how to eciently compute the histogram of As(p1) byexploiting the similarity with the histogram of As(p2). Once the his-togram of As(p) is known for a pixel p, the result of the bilateral l-ter BF[I]p (Equation (16a)) can be computed because each histogrambin indicates how many pixels q have a given intensity value I. Astraightforward application of this technique produces band artifactsnear strong edges, a.k.a. Mach bands, because the frequency spectrumof the box lter is not band-limited. Weiss addresses this issue by iter-ating his lter three times, which eectively smooths away the artifacts.Weiss [71] then demonstrates that his algorithm has a complexityon the order of O([o[ logs) which makes it able to handle any kernelsize in short times. Furthermore, his algorithm is designed such that36 Ecient Implementationit can take advantage of the vector instruction set of modern CPUs,thereby yielding running times on the order of one second for images ofseveral megapixels each. Unfortunately, the algorithm processes colorimages independently for each channel, which can introduce bleedingartifacts; in addition, it is unclear how to extend this lter for use incross bilateral ltering applications.4.4 Layered ApproximationDurand and Dorsey [21] propose a fast approximation based on theintuition that the bilateral lter is almost a convolution of the spatialweight Gs([[p q[[) with the product Gr([Ip Iq[) Iq (Equation (3)).But the bilateral lter is not a convolution because the range weightGr([Ip Iq[) depends on the pixel value Ip. Durand and Dorsey over-came this by picking a xed intensity value i, computing the productfor it, Gr([i Iq[) Iq, and convolving it with the Gaussian kernel Gr.After normalization, this gives the exact result of the bilateral lter atall pixels p such that Ip = i. Computing the bilateral lter this waywould be extremely slow because it requires a convolution for eachpossible pixel value i.Instead, Durand and Dorsey propose a two-step speed-up. First,they select a sparse subset i0, . . . , in of the intensity values. For eachvalue ik, they evaluate the product Gr([ik Iq[)Iq. This produces lay-ers L0, . . . , Ln. Each Lk is then convolved with the spatial kernel Gsand normalized to form a new layer Lk that contains the exact resultsof the bilateral lter for pixels with intensity equal to ik. For pixelswhose intensities have not been sampled, the result is linearly inter-polated from the two closest layers. To further speed up the process,they downsample the image I prior to computing the product with therange weight Gr and convolving with the spatial kernel Gs. The nallayers L0, . . . , Ln are obtained by upsampling the convolution outputs.The bilateral lter results are still obtained by linearly interpolatingbetween the two closest layers (Table 4.2).Durand and Dorseys approximation dramatically speeds up thecomputation. Whereas a brute force implementation requires severalminutes of computation for a megapixel image, their scheme runs in4.5 Bilateral Grid 37Table 4.2 Reformulation proposed by Durand and Dorsey [21].1. Given a 2D image I, compute a low-resolutionversion I, pick a set of intensities i0, . . . , in,and compute layers L0, . . . , Ln:Lk(q) = Gr([ik Iq[) Iq.2. Convolve each layer with the spatial kernel andnormalize the result:Lk = (Gs Lk) (Gs Gr),where indicates a per-pixel division andGs Gr corresponds to the sum of the weightsat each pixel.3. Upsample the layers Lk to get Lk.4. For each pixel p with intensity Ip, find thetwo closest values ik1 and ik2, and output thelinear interpolation:BF[I]p Ip ik1ik2 ik1Lk2 + ik2 Ipik2 ik1Lk1.about one second. The downside of this approach is that in practice,the achieved result can be signicantly dierent from the referencebrute force implementation, and there is no formal characterizationof this dierence. In the next section, we discuss the scheme of Parisand Durand [50] that is inspired by the layered approximation, andachieves an equivalent speed-up but with signicantly better accuracy.We discuss the relationship between both approaches at the end of thefollowing section.4.5 Bilateral GridInspired by the layered approximation of Durand and Dorsey [21], Parisand Durand [50] have reformulated the bilateral lter in a higher dimen-sional homogeneous space. They described a new image representation38 Ecient Implementationwhere a gray-level image is represented in a volumetric data structurethat they named the bilateral grid. In this representation, a 2D image Iis represented by a 3D grid where the rst two dimensions of the gridcorresponds to the pixel position p and the third dimension correspondto the pixel intensity Ip. In addition, this 3D grid stores homogeneousvalues, that is, the intensity value I is associated with a non-negativeweight w and stored as an homogeneous vector (wI, w). Using this con-cept, Paris and Durand [50] showed that the bilateral lter correspondsto a Gaussian convolution applied to the grid, followed by sampling andnormalization of the homogeneous values.More precisely, the authors consider the o ! domain and repre-sent a gray-scale image I as a 3D grid :(px, py, r) =__I(px, py), 1_ if r = I(px, py)(0, 0) otherwise . (17)With this representation, they demonstrate that bilateral lteringexactly corresponds to convolving with a 3D Gaussian whose parame-ters are (s, s, r): = Gs,s,r. They show that the bilateral lteroutput is BF[I] (px, py) = _px, py, I(px, py)_. This process is illustratedin Figure 4.1 and detailed in Table 4.3.The benet of this formulation is that the Gaussian-convoluted gridGC[] is a band-limited signal because it results from a Gaussian con-volution with a low-pass lter. Paris and Durand use this argumentto downsample the grid . As a result, they deal with fewer storeddata points and achieve performance on the order of one second forimages with several megapixels. Chen et al. [16] further improved theperformances by mapping the algorithm onto modern graphics hard-ware, obtaining running times on the order of a few milliseconds. Parisand Durand recommend using the Gaussian width parameters s andr to set the sampling rates for the 3D grid. This yields a complexityof O_[o[ + |S|s2|R|r_ where [o[ is the size of the spatial domain (i.e.,the number of pixels) and [![ is the size of the range domain (i.e., theextent of the intensity scale).This approach can be easily adapted to cross bilateral ltering andcolor images. The downside is that color images require a 5D grid which4.5 Bilateral Grid 39xxxxxsampling in the x spacespace (x)range ()Gaussian convolutiondivisionslicing0 0.2 0.4 0.6 0.810 20 40 60 80 100 120wwbfibfwbfw i0 0.2 0.4 0.6 0.810 20 40 60 80 100 120Fig. 4.1 Overview on a 1D signal of the reformulation of the bilateral lter as a linearconvolution in a homogeneous, higher dimensional space. Figure reproduced from Paris andDurand [50].no longer maps nicely onto graphics hardware and that requires largeamount of memory for small kernels (10 pixels or less).4.5.1 Link with the Layered ApproximationThe bilateral grid and the layered approximation share the idea ofsubsampling along the intensity axis and downsampling in the spa-tial domain. The major dierence is in the way the downsampling isperformed. The layered approximation encounters diculties at discon-tinuities: it averages adjacent pixels with dierent values, e.g., a whiteand a black pixel ends up being represented by one gray value that40 Ecient ImplementationTable 4.3 Approximation proposed by Paris and Durand [50]. In practice, localized down-sampling and upsampling eliminates the need to build the entire high-resolution grid inmemory.1. Given a 2D image I, build the grid : o !R2that contains homogeneous values:(px, py, r) =_(I(px, py), 1) if r = I(px, py)(0, 0) otherwise .2. Downsample to get .3. Perform a Gaussian convolution of , for eachcomponent independentlyGC__(px, py, r) = Gs,r (px, py, r),where Gs,r is a 3D Gaussian with s asparameter along the two spatial dimensions andr along the range dimension.4. Upsample GC__ to get .5. Extracting the result: For a pixel p withinitial intensity Ip, we denote (

wI, w) thevalue at position (px, py, Ip) in . The result ofthe bilateral filter isBF[I]p wI/ w.poorly represents the original signal. In comparison, the bilateral gridsubsampling strategy preserves adjacent pixels with dierent intensi-ties, because they are far apart along the intensity axis. In the whiteand black pixels case, the bilateral grid retains the two dierent valuesinvolved and thus is able to produce better results. Figure 4.2 illustratesthis behavior. The bilateral grid should be preferred over the layeredapproximation, because both approaches perform equivalently fast.4.6 Bilateral PyramidFor several applications such as detail enhancement [25], it is desirableto decompose the image into more than two layers. Fattal et al. [25]4.6 Bilateral Pyramid 41(a) Downsampling of the layered approximation (b) Downsampling of bilateral grid approximationFig. 4.2 Compared to the layered approximation, the bilateral grid better represents dis-continuities and thus yields superior results. This gure is reproduced from Paris andDurand [50].propose to compute such a decomposition by successively applyingthe bilateral lter to the image with varying parameters: the spatialparameter s is doubled at each level and the range parameter r ishalved. Based on this scenario, they describe a dedicated numericalscheme. Intuitively, instead of computing each level from scratch,they use the result from the previous level and rely on the fact thatthis image has already been smoothed to simplify the computation. Foreach level, they compute a bilateral lter based on a 5 5 kernel. Atthe rst level they apply the bilateral lter with a small kernel s = 1,and at each subsequent level they double the spatial extent of the ker-nel. A naive approach would use more coecients, e.g., a 9 9 kernel,but Fattal et al. keep the cost constant by using 5 5 samples andinserting zeros. For instance, they approximate a 9 9 kernel using5 5 samples interleaved with zeros, such that a 1 4 6 4 1row becomes 1 0 4 0 6 0 4 0 1. This proven strategy,known as an algorithme ` a trous, yields minimal errors when appliedto band-limited signals [43]. In this particular case, the signal is notTable 4.4 Complexity summary for Bilateral Filter algorithms.Brute force (Section 4.1) O

|S|2

Separable kernel (Section 4.2) O

|S| s

Local histograms (Section 4.3) O

|S| logs

Layered approximation (Section 4.4) O

|S| + |S|s2|R|r

Bilateral grid (Section 4.5) O

|S| + |S|s2|R|r

42 Ecient Implementation(a) Input (876x584)(b) Input (c) Exact bilateral filter using CIE Lab(d) Bilateral-grid implementation using per-channelRGB(0.48s, PSNRRGB = 38dB, PSNRLab = 34dB)(e) Bilateral-grid implementation using RGB(8.9s, PSNRRGB = 41dB, PSNRLab = 39dB)(f) Separable-kernel implementation using CIE Lab(5.8s, PSNRRGB = 42dB, PSNRLab = 42dB)(g) Bilateral-grid implementation using CIE Lab(10.9s, PSNRRGB = 46dB, PSNRLab = 46dB)Fig. 4.3 Comparison of dierent strategies for ltering a color source image (a,b). Processingthe red, green, and blue channels independently is fast but can cause color bleeding thatremoves the cross from the sky in (d). Filtering RGB vectors is slower but improves resultsalthough some bleeding remains (e). Using a perceptually motivated color space such asCIE-Lab addresses those artifacts (c,g). The separable-kernel implementation is fast butincurs axis-aligned streaks (f) that may undesirable in a number of applications. Theseremarks are conrmed by the numerical precision evaluated with the PSNR computed theRGB and CIE-Lab color spaces. The contrast of the close-ups has been increased for claritypurpose. This gure is reproduced from Paris and Durand [50].4.7 Discussion 43band-limited because bilateral ltering preserves edges. Yet, Fattalsresults show that in practice, this approximation achieves good resultswithout visual defects.4.7 DiscussionThe choice of implementation is crucial to achieving satisfying resultswith good performance. Table 4.4 summarizes the complexity of thevarious implementations we described.When graphics hardware is available, we recommend the bilateralgrid method of Chen et al. [16], because it achieves high-quality out-puts and real-time performances even on high-resolution images andvideos. If only the CPU is available, the choice is split between thelocal-histogram method of Weiss [71] and the bilateral grid of Parisand Durand [50]. To process color images or compute a cross bilaterallter, the bilateral grid provides a satisfying solution, especially withlarge kernels. To process gray-level images with kernels of any size,e.g., in an image-editing package where users can arbitrarily choosethe kernel size, the local-histogram approach is preferable because itconsistently yields short running times. On color images, this approachcan yield less satisfying results because channels are processed inde-pendently, which may cause some color bleeding (Figure 4.3).5Relationship between Bilateral Filtering andOther Methods or FrameworkFiltering an image while preserving its edges has been addressed inmany ways in computer vision. Interestingly, some methods give resultsthat are qualitatively very similar to those from bilateral ltering. Sothe natural question is to investigate what kind of relationships mayexist between bilateral ltering and other existing methods. In thissection we focus on local mode ltering, robust statistics and PDE-based approaches.5.1 Bilateral Filtering is Equivalent to Local Mode FilteringLocal mode ltering was introduced by Van de Weijer and van denBoomgaard [65] as an extended ltering method to preserve edges anddetails. In this section, we demonstrate that the bilateral ltering isa local mode seeking approach. Based on this histogram interpreta-tion, Weiss [71] proposed a fast numerical scheme, and Chen et al. [16]showed that the bilateral grid can be used for local histogram equal-ization. Refer to Section 4.3 for more details.Given a pixel and its associated local histogram, local mode lteringis an iterative procedure which converges to the closest highest mode of445.1 Bilateral Filtering is Equivalent to Local Mode Filtering 45Fig. 5.1 (a) Image and local neighborhood for a given pixel. (b) In the local mode ltering,proposed by Van de Weijer and van den Boomgaard [65], each pixel moves toward themaximum of the local mode it belongs to. In this example, the intensity of the center pixelwill move toward the maximum of the mode made of low-intensity pixels. (c) Eect on thelocal histogram of the range parameter.the local histogram, starting from the value of the pixel at the center ofthe neighborhood. This is illustrated in Figure 5.1(a) and (b). Choosingthe closest local mode instead of the global mode allows details to bepreserved.Like the bilateral lter, local mode ltering depends on two param-eters: one which denes the neighborhood for the local histogram esti-mation, and one which is the smoothness parameter of the histogram.The inuence of the latter parameter is illustrated in Figure 5.1(c):when the smoothing parameter increases, local modes and the globalmode merge into a single global mode which corresponds to the stan-dard Gaussian smoothed value. In that case, details are not preserved.To dene local mode ltering, given a gray-scale image I : !,one can start with the denition of a histogram:H1(i) =

qS(Iq i), i !,where is the Dirac function so that (s) = 1 if s = 0, and (s) = 0otherwise. A classical operation consists in smoothing histograms, sothat we dene:H2(i, r) = H1 Gr(i) =

qSGr(Iq i),46 Relationship between Bilateral Filtering and Other Methods or Frameworkwhere r denotes the smoothing done on the intensity values, i.e., on therange. A step further, one can dene an histogram locally, i.e., around agiven position p. To do it, one can introduce a spatial Gaussian kernelcentered around p:H3(p, i, r, s) =

qSGs([[p q[[) Gr(Iq i), (18)where s determines the spatial neighborhood around p. Local his-tograms can be used to study image properties [36] but also to performimage restoration. The idea of local mode ltering is to make the inten-sity Ip of the center pixel evolve toward the closest local maximum. So,Ip will verify:H3i (p, i, r, s)i=Ip= 0. (19)Taking into account Equation (18) and the expression of the Gaussiankernel, Equation (19) becomes:

qS(Iq i) Gs([[p q[[) Gr(Iq i) = 0,so that Ip should verify the following implicit equation:Ip = i where i is such that i=

qS Gs([[p q[[) Gr(Iq i)Iq

qS Gs([[p q[[) Gr(Iq i) . (20)To solve this implicit equation, one can propose the following iterativescheme: Given I0p = I, estimate:It+1p =

qS Gs([[p q[[) Gr(Itq Itp)Itq

qS Gs([[p q[[) Gr(Itq Itp) for all p. (21)Interestingly, the right-hand side term of Equation (21) corresponds tothe denition of the bilateral lter: Consequently, bilateral ltering canbe considered as a local mode seeking method.Remark. Another important relation established by van de Weijierand van den Boomgaard [65] is the correspondence between local5.2 The Bilateral Filter is a Robust Filter 47mode ltering and the framework of robust statistics. In fact, maxi-mizing H3is equivalent to minimizing a residual error (p, i, r, s) =1 H3(p, i, r, s). We explain this idea in more detail later, but focusmore on the link between the bilateral lter and robust statistics (seeSection 5.2).5.2 The Bilateral Filter is a Robust FilterRobust statistics oers a general background to model a large classof problems, including image restoration (see Ref. [30, 29, 40, 28, 27]for more details). Expressed as optimization problems in a discretizedspace, it is possible to dene some edge-preserving restoration formu-lations. In this section, we show that bilateral ltering corresponds toa gradient descent of a robust minimization problem.Image restoration can be formulated as a minimization problem inthe following way: Given a noisy image In, the problem is to nd theminimizer of the discrete energy:minI

pS__(Ip Inp)2+

qN(p)(Iq Ip)__, (22)where A(p) is a neighborhood p, and is a weighting function (alsocalled error norm).Energy in Equation (22) contains two kinds of terms. The rst termis a delity-of-attachment term which prevents the solution from drift-ing too far away from the noisy input values. The second term is aregularization term that will penalize dierences of intensities betweenneighboring pixels, with a strength that depends on the function .Thus the regularity of the solution will depend on function . In par-ticular, this method will be robust if we can preserve signicant inten-sity dierences such as edges, i.e., if we can distinguish the dierencebetween inliers and outliers. Several possible functions have beenproposed in literature, as we are going to show in this section.Let us now focus on the regularization term of Equation (22) toshow the relationship with the bilateral lter. To do so, we introducethe following reweighted version of the regularization term, so that the48 Relationship between Bilateral Filtering and Other Methods or Frameworkminimization problem becomes:minI

pS

qN(p)Gs([[q p[[) (Iq Ip) (23)To minimize Equation (23), one can iterate the following gradientdescent scheme:It+1p = Itp + [A(p)[

qN(p)Gs([[q p[[)

(Itq Itp). (24)By choosing (s) = 1 Gr(s), we obtain:It+1p = Itp + [A(p)[

qN(p)Gs([[q p[[) Gr(Itq Itp)(Itq Itp). (25)This equation has in fact some similarities with the bilateral lteringexpression, which corresponds to a weighted average of the data, thatwe remind here:It+1p =

q Gs([[q p[[) Gr(Itq Itp)Itq

q Gs([[q p[[) Gr(Itq Itp) (26)and, interestingly, it has been shown that Equations (24) and (26) areindeed two equivalent ways to solve the same minimization approach(see, e.g., [29]). Intuitively, one can remark that both formulas averagethe same pixels using the same weights, and the only dierence is theweight of the center pixel Itp. The conclusion is that the bilateral lteris a special case of a robust lter.More generally, Durand and Dorsey [21] studied the bilateral lterin the framework of robust statistics [29, 30] in a similar manner as thework of Black et al. [11] on PDE lters. The authors showed that therange weight can be seen as a robust metric, that is, it dierentiatesbetween inliers and outliers. The bilateral lter replaces each pixelby a weighted average of its neighbors. The weight assigned to eachneighbor determines its inuence on the result and is crucial to theoutput quality. In this context, robust statistics estimates if a pixel isrelevant, i.e., is an inlier, or if it is not, i.e., is an outlier. The strategyfollowed by the bilateral lter is that pixels with dierent intensities5.2 The Bilateral Filter is a Robust Filter 49are not related and should have little inuence on each other, whereaspixels with similar intensities are closely related and should stronglyinuence each other. The way that this intensity dierence actuallycontributes is dened by the range weight. The most common choice isa Gaussian function Gr.However, Durand and Dorsey [21] have underscored that this Gaus-sian function is only one of the possible choices among a variety ofrobust weighting functions (cf. Figure 5.2-top), a.k.a. stopping func-tions. These functions dene the weights assigned to a pixel accordingto its dierence of intensity with the center pixel. For instance, a clas-sical non-robust mean assigns the same weight to all pixels. In compar-ison, robust functions have a bell prole that assign lower weights topixels with a dierent intensity. The dierences lie in the fall-o ratewhich denes how narrow is the transition between inliers and outliers,and in the tail value: either non-zero, meaning that outliers still havesome limited inuence, or zero, meaning that outliers are completelyFig. 5.2 Qualitative illustration of the inuence of weighting functions for image restoration.The rst two rows show respectively dierent choices of weighting functions and their cor-responding inuence functions

. These graphs were adapted from Black et al. [11]; Durandand Dorsey [21]. Last rows show results obtained on the image presented in Figure 5.1 withthe corresponding weighting functions.50 Relationship between Bilateral Filtering and Other Methods or Frameworkignored. This behavior is better observed on the inuence function(Figure 5.2-bottom) that shows the variations of the output dependingon the pixel intensity. The constant weight of classical averaging is notrobust because its inuence function is unbounded which reects thefact that a single pixel can have an unlimited inuence on the meanvalue, e.g., a single very bright pixel can make the average arbitrarilyhigh. In contrast, robust inuence functions are bounded, showing thata single pixel cannot modify the output beyond a certain point. Somerobust functions such as the Gauss, Tukey, and Lorentz functions areeven redescending, reecting the fact that pixels with a large intensitydierence are considered irrelevant and ignored, i.e., they have noinuence on the output.Durand and Dorsey [21] showed that these concepts can be appliedto the bilateral lter and that the choice of the range function deneshow the pixels across are handled (see some results in Figure 5.2). Forinstance, with the classical Gaussian function, pixels across edges stillhave some inuence though very limited; with a Tukey function, thesepixels would be ignored. However, according to Durand and Dorseysexperiments, the Gauss and Tukey functions perform better for theirtone-mapping operator. As far as we know, these options have not beentested with other applications.The energy function dened by robust norms is usually not con-vex and can lead to local minima, similar to the local modes ofhistograms discussed above. Which local minimum is most desirabledepends on the application. The bilateral lter performs one steptoward the minimum closest to the input value. This is usually desir-able because most applications seek to smooth low-amplitude noisewhile retaining local structure. However, some cases might require adierent treatment, such as impulse noise where the value of a pixelcan be severely corrupted. In this case, the robust statistics litera-ture advocates initialization with an estimator that is very robustbut might not be very precise, such as the median. For impulse noiseremoval, a median lter can be used to steer the bilateral lter ata pixel toward a local minimum that is consistent with its neigh-bors. In practice, this involves computing the range Gaussian basedon the dierence between a pixel and the median-ltered image rather5.3 Bilateral Filtering is Equivalent to the Perona and Malik Equation 51than the dierence with the input pixel value. See Section 3.1.4 fordetail.Remark. As connections can be established between robust statisticsand nonlinear PDEs, then we have also the same interpretations ofbilateral ltering as a robust nonlinear operator in the continuousframework of PDEs. This is further explained in Section 5.3.5.3 Bilateral Filtering is Equivalent Asymptotically to thePerona and Malik EquationBilateral ltering smooths an image while preserving strong edges.Interestingly, many research projects were carried out in the eld ofnonlinear partial dierential equations (PDEs) to achieve the samegoal, and some models such as [52] give results very similar to bilat-eral ltering. In this section we revisit several contributions showingthe links between bilateral ltering and PeronaMalik model in thediscrete setting, and more generally between neighborhood lters andPDE-based approaches in the continuous setting.1Of course, the eld ofPDE-based approaches is very large and one may nd better approachesthan bilateral ltering depending on the application. Intensive researchhas been carried out in this area, including nonlinear approaches forimage restoration (we refer to [2] for a review). Here we focus on thenonlinear operators that are related to bilateral ltering.5.3.1 Results in the Discrete SettingAnyone studying PDE-based approaches for image processing cameacross the famous nonlinear one by Perona and Malik [52]. Startingfrom the heat equation and based on the remark that I = div(I),the authors proposed to introduce a weighting coecient depending on1Until now, we considered an image as a discrete set of pixels. Instead, in this section, wewill need to consider an image dened continuously, i.e., an analog image where space isno longer discretized. The motivation becomes clear when one needs for instance to denea notion of derivative. Formally, keeping the same notations, this introduces only minorchanges in the formulation of the bilateral lter. The only dierence here is that sums arereplaced by integrals: Positions p and q now vary on a continuous domain.52 Relationship between Bilateral Filtering and Other Methods or Frameworkthe image gradient to prevent edges to be smoothed. Their model iswritten in the continuous setting:It = div_c([[I[[2) I_, (27)where c : [0, +[]0, +] is a smooth decreasing function. We refer toPerona and Malik [52] for more details.In the discrete setting, Durand and Dorsey [21] showed that if thebilateral lter is restricted to the four adjacent neighbors of each pixel,then it actually corresponds to a discrete version of the Perona andMalik [52] model.This result has been extended by Elad [23] and Barash and Comani-ciu [7] who have demonstrated that the bilateral lter can be seen asthe sum of several PeronaMalik lters at dierent scales, that is, theimage derivatives are computed with pixels at a distance, not only withadjacent pixels.5.3.2 Results in the Continuous SettingAnother important contribution came from Buades et al. [13] whoproved rigorously that for small neighborhoods, the Yaroslavskylter:Ys,r[I](p) = 1W(p)_B(p,s)exp_[I(q) I(p)[2r2_I(q)dq, (28)i.e., a bilateral lter using a box function as spatial weight, behaves asthe PeronaMalik lter. Such a result can only be established locally,that is when small neighborhoods are considered, because the actionof PDE is very local (local structure is taken into account throughderivatives). So the proof of Buades et al. is based on an asymptoticstudy which relies on the fact that the image is well approximated by itssecond-order Taylor expansion; their result holds for any neighborhoodas long as it covers a suciently regular area such as a region of skinor sky.In this section, we present the results by Buades et al. [13] whorevisited the notion of the bilateral lter by studying the more generalneighborhood lter (see also [2] for more details). Here the notion of5.3 Bilateral Filtering is Equivalent to the Perona and Malik Equation 53neighborhood must be understood broadly: neighboring pixels, neigh-boring or similar intensities, or neighboring neighborhoods. Each ofthese meanings will correspond to a specic lter. Interestingly, theauthors also proved the link between these lters and well-known PDEssuch as the heat equation and the PeronaMalik equation.A general neighborhood lter can be described as follows. Let I bean image to be ltered or denoised and let ws : R+R+and wr :R+R+be two functions whose roles will be to enforce, respectively,photometric and geometric locality (in Section 2, ws and wr are bothGaussian kernels). Parameters s and r will measure the amount ofltering for the image I. The ltered image at scale (r, s) is given by:BF[I](p) = 1W(p)_Swr([I(q) I(p)[) ws([[p q[[)I(q)dq,where W(p) is a normalization factor.W(p) =_Swr([I(q) I(p)[) ws([[p q[[)dq.For simplicity we suppose that the image has been extended from thedomain image o (a rectangle) to the whole of R2, by symmetry andperiodicity.With this formalism we can easily recover the classical spatial linearGaussian ltering by choosing wr 1 and ws(t) = exp_ t