Xing Mei ;  Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng...

55
Xing Mei ; Xun Sun ; Mingcai Zhou ; Shaohui Jiao ; Haitao Wang ; Xiaopeng Zhang Samsung Advanced Institute of Technology, China Lab Computer Vision Workshops, 2011 IEEE On Building an Accurate Stereo Matching System on Graphics Hardware

description

On Building an Accurate Stereo Matching System on Graphics Hardware. Xing Mei ;  Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang  Samsung Advanced Institute of Technology, China Lab Computer Vision Workshops, 2011 IEEE. Outline. Introduction Related Works - PowerPoint PPT Presentation

Transcript of Xing Mei ;  Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng...

Page 1: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Xing Mei ; Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao ;

  Haitao Wang ; Xiaopeng Zhang Samsung Advanced Institute of Technology, China Lab

Computer Vision Workshops, 2011 IEEE

On Building an Accurate Stereo Matching System on

Graphics Hardware

Page 2: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Outline• Introduction• Related Works• Algorithmn• CUDA Implementation• Experimental Results• Conclusion

Page 3: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Introduction

Page 4: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

IntroductionDense two-frame stereo matching

• Compute a disparity map from stereo images.• Broad applications: 3D reconstruction, view interpolation

Page 5: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Related Works

Page 6: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Related Works• Local methods• Compute each pixel’s

disparity independently over a local support region.• Fast but inaccurate.

•Global methods• Solve the stereo problem in

an energy minimization process.• Accurate but slow due to

time-comsuming global optimizer.(GC,BP)

Page 7: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Related Works• Propagation-based methods• Produce quasi-dense or dense disparity results from a set of seed pixels.

• Relatively fast but sensitive to early wrong matches• use segmented regions as guided propagation unit• expensive cost

Page 8: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Related Works• Introduce a simple guided unit for propagation

: pixel-wise 1D line segments. • No image segmentation required here.• Simple, fast and accurate

Page 9: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Algorithmn

Page 10: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Algorithmn• Framework

Multi-step Disparity Refinement

Scanline Optimization

Cross-based Cost Aggregation

AD-Census Cost InitializationInput: Stereo images

Output: Disparity map

Page 11: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Algorithmn

Multi-step Disparity Refinement

Scanline Optimization

Cross-based Cost Aggregation

AD-Census Cost InitializationInput: Stereo images

Output: Disparity map

Page 12: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Disparity Cost Computing• Cost mesure : AD, BT, gradient-based measures, non-

parametric transforms(rank/census[3])......• Combination : SAD + gradient[6] , AD + Census• AD (Absolute Distance)• Constant color assumption• Repetitive structures

• Census • Encodes local image structures • Textureless regions

[3] H. Hirschmuller and D. Scharstein. “Evaluation of stereo matching costs on images with radiometric differences.”IEEE TPAMI, 31(9):2009.[6] A. Klaus, M. Sormann, and K. Karner. “Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure.” ICPR,2006.

Page 13: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

AD-Census Cost Initialization+• p : pixel• d : level• >> a robust function on variable 𝑐

• pd = (x-d,y) in the right image

• : Hamming distance[22]

[22] R. Zabih and J. Woodfill. “Non-parametric local transforms for computing visual correspondence.” In Proc. ECCV, 1994.

d

Left I Right I

Page 14: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Census Transform

1 1 0 0 0

1 1 0 0 0

1 1 X 0 0

0 0 0 1 1

1 1 1 1 1

121 130 26 31 39

109 115 33 40 30

98 102 78 67 45

47 67 32 170 198

39 86 99 159 210

1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1Census transform window :

Page 15: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Census Hamming Distance

• Left image

• Right image

Hamming Distance = 3

1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1

1 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1

XOR

0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 16: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

AD-Census Cost Initialization

+

• > >> a robust function on variable 𝑐

Page 17: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

AD-Census Cost Initialization

• AD-Census measure produces proper disparity results for both repetitive structures and textureless regions.

Page 18: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Algorithmn

Multi-step Disparity Refinement

Scanline Optimization

Cross-based Cost Aggregation

AD-Census Cost InitializationInput: Stereo images

Output: Disparity map

Page 19: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

• Cross construction

• Line ending points P1, P2 for P are located when rule 1 or 2 are violated:R1: Color self-similarity in the line region: smooth depth assumption

R2: Arm length limitation: avoid over-smoothness

Cross-based Cost Aggregation[23]

[23] K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.” IEEE TCSVT,2009.

Page 20: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Cross-based Cost Aggregation

Page 21: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Cross-based Cost Aggregation• Enhance cross construction(use pixel p’s left arm and the endpoint pixel pl as an example)

• • •

Page 22: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Cross-based Cost Aggregation• Cost aggregation

• Run this step for 4 iterations to get stable cost values. • For iteration 1 and 3, aggregated horizontally and then vertically. • For iteration 2 and 4, aggregated vertically and then horizontally.

• Reduce the errors at depth discontinuities.

Page 23: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Cross-based Cost Aggregation

• Our aggregation method can better handle large textureless regions and depth discontinuities.

Page 24: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Cross-based Cost Aggregation

[21] K.-J. Yoon and I.-S. Kweon. “Adaptive support-weight approach for correspondence search.” IEEE TPAMI, 2006.[23] K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.” IEEE TCSVT,2009.

Page 25: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Algorithmn

Multi-step Disparity Refinement

Scanline Optimization

Cross-based Cost Aggregation

AD-Census Cost InitializationInput: Stereo images

Output: Disparity map

Page 26: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Scanline Optimization[2]

• 4 scanline optimization processes are performed independently.• 2 horizontal directions• 2 vertical directions

𝐶2

𝐶𝑟

𝐶𝑟𝐶𝑟

𝐶𝑟

[2] H. Hirschmuller. Stereo processing by semiglobal matching and mutual information.” IEEE TPAMI, 2008.

Page 27: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

r : directionp-r : the previous pixel along the same direction𝑃1, 𝑃2 : penalize the disparity changes between neighboring

pixels. (𝑃1 ≤ 𝑃2 ) [8]

Scanline Optimization

[8]S. Mattoccia, F. Tombari, and L. D. Stefano. “Stereo vision enabling precise border localization within a scanline optimization framework.” In Proc. ACCV, pages 517–527, 2007.

pp-rr

Page 28: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Scanline Optimization• The final cost :

• The disparity with the minimum 𝐶2 value is selected as pixelp’s intermediate result.

𝐶2

𝐶𝑟

𝐶𝑟𝐶𝑟

𝐶𝑟

Page 29: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Algorithmn

Multi-step Disparity Refinement

Scanline Optimization

Cross-based Cost Aggregation

AD-Census Cost InitializationInput: Stereo images

Output: Disparity map

Page 30: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Multi-step Disparity Refinement• Outlier Handling• Outlier Detection• Iterative Region Voting• Proper Interpolation

• Depth Discontinuity Adjustment• Sub-pixel Enhancement

Page 31: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Outlier Handling--Detection• The outliers : 𝐷𝐿(p) != 𝐷R(p − (𝐷𝐿(p), 0)) • Outliers are further classified into occlusion and mismatch

points

• p intersect its epipolar line and 𝐷R is checked• If no intersection p is labelled as “occlusion”, otherwise “mismatch”

Page 32: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Outlier Handling--Iterative Region Voting• Construct cross-based regions and a robust voting scheme

• Sp : • 𝜏𝑆, 𝜏𝐻 : threshold values

• 5 iterations

dd

Page 33: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Outlier Handling--Proper Interpolation• occlusion• The pixel with the lowest disparity value is selected for interpolation• It’s most likely comes from the background

• mismatch points• The pixel with the most similar color is selected for interpolation.

Page 34: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Depth Discontinuity Adjustment• For each pixel p on the disparity edge, two pixels p1, p2 from

both sides of the edge are collected. • 𝐷𝐿(p) is replaced by 𝐷𝐿(p1) or 𝐷𝐿(p2) if one of the two pixels

has smaller matching cost than 𝐶2(p,𝐷𝐿(p)).

𝐷𝐿(P1

)𝐷𝐿(P) 𝐷𝐿(P2

)

Page 35: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

• Quadratic polynomial interpolation

• With 3*3 median filter

Sub-pixel Enhancement[20]

[20] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister. “Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling.” IEEE TPAMI, 2009.

Page 36: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Multi-step Disparity Refinement

• The average error percentages after performing each refinement step.

Page 37: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

CUDA Implementation

Page 38: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

CUDA Implementation• Compute Unified Device Architecture (CUDA) is a

programming interface for parallel computation tasks on NVIDIA graphics hardware.• The computation task is coded into a kernel function. • The allocation of the threads is controlled with two hierarchical

concepts: grid and block.• A kernel creates a grid with multiple blocks, and each block

consists of multiple threads. Kernel

Grid

Block

Thread

Thread

Block …

Grid …

Page 39: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

CUDA Implementation• Cost Initialization: • Parallelize with × 𝑊 𝐻 threads. • Organize into a 2D grid and the block size is set to 32× 32. • Each thread computes a cost value for a pixel at a given disparity. • For census transform, a square window is require for each pixel, which

requires loading more data into the shared memory for fast access.Kernel

Grid

Block

Thread

32X32 …

Grid

Page 40: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

CUDA Implementation• Cross-based Cost Aggregation: • A grid with × 𝑊 𝐻 threads.• Cross construction : block size is 𝑊

or 𝐻 to efficiently handle a scan line • Cost aggregation : block size is 32X32• Data reuse with shared memory is

considered in both steps.

Page 41: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

CUDA Implementation• Scanline Optimization: • This step is different, because the process is sequential in the scanline direction and

parallel in the orthogonal direction.• 𝑊 × 𝐷 or × 𝐻 𝐷 threads

• Disparity Refinement: • 𝑊 × 𝐻 threads

Page 42: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Experimental Results

Page 43: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Experimental Results• Device : A PC with Core 2 Duo 2.20GHz CPU and NVIDIA

GeForce GTX 480 graphics card• Settings parameters:

• Source : Middlebury http://vision.middlebury.edu/stereo/ HHI database(book arrival) Microsofy i2i database(Ilkay)

Page 44: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Experimental Results

• The GPU-friendly system brings an impressive 140× speedup.• The average proportions of the GPU running time for the four

computation steps are 1%, 70%, 28% and 1% respectively. • The iterative cost aggregation step and the scanline

optimization process dominate the running time.

Tsukuba Venus Teddy Cones

CPU 2.5 4.5 15 15

GPU 0.015 0.032 0.095 0.094

Page 45: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Experimental Results

• First row: disparity maps generated with our system. • Second row: disparity error maps with threshold 1. • Errors in unoccluded and occluded regions are marked in black and gray respectively.

Page 46: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Experimental Results

Page 47: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Experimental Results• video

Page 48: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Experimental Results

Snapshots on ’book arrival’ stereo video

Page 49: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Experimental Results

Snapshots on ’Ilkay’ stereo video

Page 50: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Conclusion

Page 51: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 

Conclusion• Contributions • Present a near real-time stereo system with accurate disparity results.• Combine some known techniques without sacrificing performance and

parallelism to obtain the high quality disparity map.

• Future works• Improve to apply it in real world applications • Robust parameter setting methods

Page 52: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 
Page 53: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 
Page 54: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang 
Page 55: Xing Mei ;  Xun  Sun ;  Mingcai Zhou ;  Shaohui Jiao  ;   Haitao Wang ; Xiaopeng Zhang