[IEEE 2008 15th IEEE International Conference on Image Processing - San Diego, CA, USA...

SCALABLE STEREO MATCHING WITH LOCALLY ADAPTIVE POLYGON APPROXIMATION

Ke Zhang 1,2, Jiangbo Lu 1,2, Gauthier Lafruit 21Department of Electrical Engineering, University of Leuven, Belgium 2Multimedia Group, IMEC, Kapeldreef 75, B-3001, Leuven, Belgium

ABSTRACT

We present a scalable stereo matching algorithm based on a Locally Adaptive Polygon Approximation (LAPA) technique. For accurate local stereo matching, pixel-wise adaptive polygon-based support windows are constructed to approximate spatially varying image structures. Central to building these pixel-wise polygons is a fast algorithm that adaptively decides a set of directional scales, utilizing intensity and spatial information. Thanks to the locally adaptive support window, the proposed method achieves high stereo reconstruction quality both in depth-discontinuity regions and homogenous regions. Moreover, our LAPA-based method offers flexible scalability in terms of quality-complexity trade-off. As a specific instantiation favoring high-quality stereo estimation, our 8-direction stereo method outperforms most of the other local stereo methods and even some global optimization techniques. Another low-complexity alternative is also presented, achieving a significant speedup of up to a factor 20 with graceful accuracy degradation. Within a unified LAPA framework, our stereo method hence facilitates more flexibility in conciliating different algorithm design needs with processing performance issues.

Index Terms— stereo matching, locally adaptive polygon approximation, scalability

1. INTRODUCTION

Stereo matching is a long-standing topic which has attracted intensive research interests. A substantial amount of work on the topic is systematically surveyed and evaluated by Scharstein and Szeliski [1]. Local stereo matching is favored in practical vision applications with comparably low complexity. This paper focuses on local approaches and our specific contribution bootstraps onto a Locally Adaptive Polygon Approximation (LAPA) technique, augmented with quality-complexity scalability features.

In general, local stereo matching approaches choose to aggregate matching cost over a given support window to increase robustness to noise and texture variation. Many local stereo methods have been proposed for support window generation. Multiple-window methods [2][3] target an optimal support window among predefined window candidates. The adaptive-window method in [4] targets an optimal support window at each pixel. However, the support window of several constrained shapes cannot appropriately approximate the general image structures, so the produced disparity map is often blurring near arbitrarily shaped depth-discontinuity regions. Adaptive-weight methods [5] assign different support weights to each pixel in a fixed support window. They provide a properly weighted support window at each pixel, but also require huge amounts of memory for storing pre-computed pixel-wise changing support weights to accelerate the online

(a) (b)

(c) (d) Fig. 1. Polygon-based support window construction and the resulting dense disparity maps with our algorithm in 8-direction configuration (a)(b) and in 4-direction configuration (c)(d).

computation. Quality-complexity scalability of prior work mostly depends on the window size adjustment with limited design choices. In our prior work [6], a Local Polynomial Approximation (LPA)- Intersection of Confidence Intervals (ICI) method is used to find near-optimal polygon-based support windows. It exploits the ICI rule to determine the sector scales. This LPA-ICI algorithm, however, involves multiple statistical hypotheses testing and is not optimized for fast execution speed.

In this paper, we propose a complexity-aware scalable stereo matching algorithm based on Locally Adaptive Polygon Approximation (LAPA). Pixel-wise multidirectional varying-scale support windows are constructed adaptively with appropriate approximation to local image structures. Besides high accuracy, our LAPA-based method inherently provides considerable scalability in aspects of quality and complexity. We propose two configurations of our method in this paper. One configuration for high quality outperforms most of the other local stereo methods and even some global optimization techniques. The other configuration targeting low complexity achieves a significant speedup with graceful quality degradation. The respective support window construction and estimated disparity maps of the two configurations for “Tsukuba” image pair are illustrated in Fig. 1. As shown in the resulting disparity maps, the proposed method achieves high-quality both in depth-discontinuity regions and homogenous regions. We present the details of our LAPA-based stereo matching algorithm in Section 2. Experiment results are illustrated in Section 3. Section 4 concludes the paper and outlines our future work.

Email: {Zhangke, Jiangbo.Lu, Gauthier.Lafruit}@imec.be

313978-1-4244-1764-3/08/$25.00 ©2008 IEEE ICIP 2008

2. LAPA-BASED STEREO MATCHING

2.1 Algorithm framework

LAPA-based Support Window

Construction

LAPA-based Support Window

Construction

Cost Aggregation

&WTARefinement

Left Image

Right Image

Disparity

XU

Xod Xd

''XU

Fig. 2. Framework of our proposed LAPA-based stereo method.

The framework of our proposed stereo matching method is illustrated in Fig. 2. As an important preprocessing step for LAPA-based stereo matching, pixel-wise adaptive support windows are symmetrically built for a pair of stereo images. More specifically, a 3 median filter is first applied to the raw image to suppress the subtle impact of image noise. The filtered color images are then converted to the opponent color space where spatially adaptive polygon-based support windows are constructed with our fast adaptive-scale decision algorithm. The generated multidirectional varying-scale polygons appropriately approximate the local image structures and form the desirable support windows for cost aggregation, key to the success of a local stereo method [1]. Cost aggregation is performed across the common area between support windows of two corresponding pixels in the left and right image. We use a classical Winner-Takes-All (WTA) method in the disparity selection step, i.e., the disparity with the smallest matching cost is chosen as the preferred outcome. In the disparity refinement step, the initial disparity estimate is refined with a local high-confidence voting scheme [6].

3

For local stereo methods, cost aggregation across a proper local support window is of vital importance and determines the matching accuracy to a great extent. As most prior work, the key contribution of our LAPA-based method is also at this step.

2.2 LAPA-based cost aggregation and disparity refinement

In principle, effective spatial support to a center pixel should ideally be obtained from the neighborhood with similar disparity. Without disparity information beforehand, the support area for a pixel can only be adequately derived from the raw images. Our method is based on the observation that pixels with similar intensity within a constrained area are likely to be from the same image structure, therefore having similar disparity.

We use multidirectional varying-scale polygons to approximate the general and spatially varying image structures. An 8-direction polygon is shown as a typical example in Fig. 3. KN polygon candidates can be produced with the K directions and N scales in each direction. Appropriately selecting a polygon from the candidates provides efficient approximation to the underlying general image structures.

Our cost aggregation is performed as follows. At each pixel X inthe left image andI 'X in the right image , desirable polygon-based support windows

'I

XU and are constructed based on the local image structures, respectively. The raw cost is aggregated over the common area of

''

XU

XU and as in Eq. (1). ''

XU

)( 2Xh

)( 3Xh

)( 4Xh)( 5Xh

)( 6Xh )( 7Xh

)( 8Xh

1( )Xh

Fig. 3. A typical LAPA-based support window construction (K =8). K could optionally be selected larger than 8, enabling an even finer geometry approximation at the computational load increase.

','''

'

'( )

( , ) ,#{ }

X X

XX X

re r

E X X r U UU U X (1)

Note that X and 'X are aligned as centers with r representingthe support pixel position relative to the center. is the raw matching cost calculated as in Eq. (2),

, ' ( )rX Xe

, '{ , , }

'( ) min{ | ( ) ( ) |, }X X c cc R G B

e r I r I r T (2)

where is the intensity of the color band c and T is a truncation value that limits the raw matching cost. E X is taken as the matching cost between

cI( , ')X

X and 'X .

In terms of polygon-based support window construction, we fine-tune the problem to determine a proper scale at each direction and then construct the polygon by connecting the selected scales. To this end, an efficient scale selection method, different from the LPA-ICI [6], is proposed as follows. The basic idea is that two pixels in the vicinity stratifying a given difference threshold are supposed to have similar disparity. The difference D is defined jointly with the intensity difference and spatial

distance, 'X XI

, 'X XS between two pixels 1X , 2X as in Eq. (3),

, ,1 2 1 2 1 2X X X X X XD I ,S (3)

where is a weight parameter. , 'X XI is calculated in the opponent color space and takes the maximum intensity difference among three color bands. , 'X XS is the spatial distance in pixel scale. The intensity together with the spatial distance determines the hypothesized correlation between two pixels. Pixels with similar intensity and smaller spatial distance are likely to be from the same image structures with similar disparity.

The proper scales parameters are selected based on the definition above with a thresholding method. Given a threshold , the difference between the center pixel and its neighboring pixels

D}iX h

X{

kin the direction k is calculated one by one.

Among a predefined set H containing N scale candidates, the maximum satisfying the threshold is chosen as the optimal scale

h)k(Xh for the direction k . The pseudo-code is shown in

Fig. 4, where { }i kX h is the pixel with the distance h to in

the direction

i X

k . With the selected directional scales ( k )Xh , the

314

X

{ }

1

,

( )

1, ...,

( ) ;

2, ...,

;

( ) ;

///

/

///

i k

k

X

X X h

X

K directions

initializationk

N candidates each direction

threshold checking

update hk i

end scale loop each direction

end directio

for k K

h h

for i N

if D

break

h

end

end

h

n loop

U+X )( 2Xh

)( 3Xh

)( 4Xh

)( 1XhX

Fig. 5. A low-complexity LAPA-based support window construction (K = 4).

Fig. 4. Pseudo-code of the fast directional adaptive-scale selection. Other design choices can be easily obtained by varying K and .Larger

HK and more candidates in H provide more accurate

approximation to image structures, hence yielding high quality. On the other side, low complexity implementations can be obtained by constraining and , e.g. further reducing to 2. K H K

multidirectional varying-scale polygon can be easily constructed by connecting them, and aggregation is then carried out according to Eq. (1).

XU

After the aggregation step, WTA is used to select the optimal disparity based on the aggregated cost . The

disparities are further refined using a local high-confidence voting scheme [6] within the pixel-wise adaptive neighborhood.

Xod

Xod

( , ')E X X 3. EXPERIMENT RESULTS The other parameters for the 8-direction configuration are set as

100T , 1 / 3 , 20 while for the 4-direction one they are T 40 , 1 / 3 , 20 he parameters remain the same for all test images. Note that we assigned the 4-direction configuration a smaller T because of its le accurate approximation compared to the 8-direction one. As a result, an aggressive truncation value is desired to maintain the discriminative power.

The estimated disparity

. T

ss

maps for four image pairs are illustrated in

ed method

ed

mory

0.89MB for 8-direction and 4-direction configuration, respectively.

The refinement is based on the observation that pixels from the same support window have similar disparities with high confidence. If the distribution among separate disparity values is modeled with a histogram, a peak is very likely to occur. The peak corresponds to a common mean (optimal value) when regarding the disparities of the support window as outcomes of a repeated independent experiment. The refinement is specifically performed as follows. For a pixel X with an adaptive support window XU ,

a histogram on each disparity for pixels in ( )idh XU is built.

The final disparity d of is then selected as X

X id , associated

with the maximum value as follows: (h

axd h

)id

( ),d

Fig. 6. Using the evaluation method of the Middlebury benchmark [9], Table 1 reports the percentage of “bad pixels” at which the disparity error is bigger than 1. For each pair of images, the results in non-occluded (nonoc.) regions, all (all) regions and depth-discontinuity (disc.) regions are reported respectively.

From the disparity maps, we conclude that our proposachieves high-quality both in depth-discontinuity regions and homogenous regions. As shown in Table 1, our 8-direction configuration is highly ranked among local approaches. It achieves similar quality with the adaptive-weight method [5] and nearly the same quality with the LPA-ICI method [6]. With graceful degradation, the low-complexity 4-direction configuration also obtains a good ranking among well performing stereo methods.

Compared with the approaches with similar quality, our propos

. (4) min max{ ...,i

ii i

dd dargm }d

2.3 Implementation of two configurations

Based on the proposed LAPA-based stereo matching framework, we implemented our stereo method with two example configurations: one for high quality and the other for low complexity (more fine-scalable operating modes can be solicited from this practical scalability range). The first one is 8-direction ( = 8) polygon-based, as shown in Fig. 3. It has 6 possible scales

in each direction. Fig. 1 justifies that the selected polygons approximate local image structures accurately.

KH {1, 2, 4, 6,12,17}

algorithm shows a good potential of low-complexity implementation for its efficient polygon construction and representation methods. Without optimizing the code, we tested the time of matching one image pair. The experiments are run on a P4 3.2GHz PC with 1.0GB memory. The execution time for the four image pairs is listed in Table 2. Observe that our 4-direction configuration achieves about 20 times speedup compared to the 8-direction configuration. The proposed LAPA-based technique shows good scalability in aspects of quality and complexity.

Additionally, our method has a great advantage in me

As an alternative configuration, we also developed a 4-direction ( K = 4) rectangle-based configuration with shown in Fig. 5. Instead of connecting the scales with straight lines directly, a rectangle is constructed by connecting the scales with polygonal lines as illustrated in Fig. 5. The comparably simple shape can be fixed with four scales. Moreover, the common area between support windows can be easily calculated by taking the smaller scale in each direction. The configuration reduces the complexity significantly while its varying-shape rectangles provide an acceptable approximation to the image structures (see Fig.1 (c, d), albeit at lower quality than in Fig. 1 (a, b)).

{1, 2,3, 4, 6,8,11}H

consumption in contrast to adaptive-weight method [5]. Instead of storing pixel-wise support weights in floating-point, a binary mask suffices in our case for a shape-adaptive polygon. Taking “Tsukuba” as an example, adaptive-weight method [5] consumes about 1.08GB while our method consumes only 34MB and

315

(a) (b) (c) (d)rithm (the first

Ta ] . Cones

Fig. 6. Dense disparity maps for (a) Tsukuba, (b) Venus, (c) Teddy, and (d) Cones, estimated using our LAPA-based algorow for 8-direction configuration and the second row for 4-direction configuration).

ble 1. Evaluation results based on the online Middlebury stereo benchmark service [9Tsukuba Venus Teddy Algorithm

nonoc. all disc. nonoc. all disc. nonoc. all disc. nonoc. disc. allAdap.Weight [5] 1.38 1.85 6.90 0.71 1.19 6.13 7.88 13.3 18.5 3.97 9.79 8.26MultiCamGC [7] 1.27 1.99 6.48 2.79 3.13 3.60 12.0 17.6 22.0 4.89 11.8 12.1

LPA-ICI [6] 2.29 2.88 8.94 0.80 1.11 3.41 10.5 15.9 21.3 6.13 13.2 13.3LAPA (K=8) 2.53 3.26 9.01 0.89 1.70 3.46 9.76 17.6 19.6 7.68 16.9 15.6GenModel [8] 2.57 4.74 13.0 1.72 3.08 16.9 6.86 15.0 19.2 4.64 14.9 11.4LAPA (K=4) 3.32 4.07 12.6 1.70 2.78 9.17 11.9 19.9 22.2 6.71 15.8 12.8

GC [1] 1.94 4.12 9.39 1.79 3.44 8.75 16.5 25.0 24.9 7.70 18.2 15.3DP [1] 4.12 5.04 12.0 10.1 11.0 21.0 14.0 21.6 20.6 10.5 19.1 21.1

Table 2. Execution time (in second) for four image pairs. sTsukuba Venus Teddy Cone

4-dir. 2.5 4.9 9.9 7.58-dir. 42.1 80.2 208.0 1 98.3

4. CONCLUSIONS AND FUTURE WORKThis p tching

od is complexity-aware and shows great

5. REFERENCES [1] D. Scharstein and my and evaluation of dense two-frame stereo correspondence algorithms,” Int’l Journal of Computer Vision 47(1), pp. 7–42, May 2002.

[2] A. Fusiello, V. Roberto, and E. Trucco, “Efficient stereo with

ndling occlusions

r Stereo Correspondence

ight approach

c local high-

h, “Multi-camera scene

ed depth and

[9] Middlebury stereo page. http://vision.middlebury.edu/stereo/

aper proposes a LAPA-based scalable stereo maalgorithm, constructing multidirectional varying-scale support windows to approximate local image structures. Under a unified LAPA framework, two example configurations with 8 and 4 directions are implemented respectively. The proposed method produces accurate smooth and discontinuity-preserving disparity maps, while demonstrating good quality-complexity scalability with a potential execution time speedup of 20. Our method also achieves high cost-effectiveness compared to prior-art approaches with similar quality.

The proposed methpotential for parallelization. We are working on accelerating it on parallel processing units such as graphics hardware.

R. Szeliski, “A taxono

multiple windowing,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 858–863, 1997.

[3] S. B. Kang, R. Szeliski, and C. Jinxiang, “Hain dense multi-view stereo,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 103–110, 2001.

[4] O. Veksler, “Fast Variable Widnow fousing Integral Images”, in Proc. Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 556–561, 2003.

[5] K. J. Yoo and S. Kweon, “Adaptive support-wefor correspondence search,” IEEE Trans. Pattern Anal. Machine Intell., 28(4), pp. 650–656, April 2006.

[6] J. Lu, G. Lafruit, and F. Catthoor, "Anisotropiconfidence voting for accurate stereo correspondence," Proc. SPIE-IS&T Electronic Imaging, vol. 6812, Jan. 2008.

[7] V. Kolmogorov and R. Zabireconstruction via graph cuts,” in Proc. European Conference on Computer Vision, pp. 82–96, 2002.

[8] C. Strecha, R. Fransens, and L. V. Gool, “Combinoutlier estimation in multi-view stereo,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 2394–2401, 2006.

316

[IEEE 2008 15th IEEE International Conference on Image Processing - San Diego, CA, USA...

Documents

Transcript of [IEEE 2008 15th IEEE International Conference on Image Processing - San Diego, CA, USA...