Binocular Stereo - University of California, San...
Transcript of Binocular Stereo - University of California, San...
Binocular StereoYangyue Wan
Binocular Stereo● Stereo Matching by Training a Convolutional Neural Network to Compare
Image Patches
● Efficient Deep Learning for Stereo Matching
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
What is Stereo Matching?● Different horizontal view
● Correspondence
● Disparity
Four Steps of Stereo Algorithm1. Matching cost computation
2. Cost aggregation
3. Optimization
4. Disparity
Q: What is matching cost?
What is Matching Cost?● Matching cost measures the similarity/difference of pixels● Corresponding pixel is chosen in a way such that the similarity between the
pixels is high, which means matching cost is low● “Winner-takes-all”: For every pixel select the disparity with lowest cost
Matching Cost by Learning Similarity● Inspiration
● Construct dataset
○ Same amount of positive/negative training examples (pairs of patches) from KITTI/Middlebury
● Network architectures
○ Fast
○ Accurate
Network ArchitecturesFast: Cosine similarity
Loss:
Q: Why this loss?
Network ArchitecturesAccurate: FC layers
Loss:
Q: Why this loss?
Matching Cost● Inspiration
● Construct dataset
● Network architectures
● Computing the matching cost
○ Perform the forward pass for each image location and each disparity under consideration
Matching Cost● Computing the matching cost
○ Perform the forward pass for each image location and each disparity under consideration
○ Running time?
Stereo MethodThe raw outputs of previous steps are not enough to produce accurate disparity map, post-processing steps are needed
● Cross-based cost aggregation
● Semiglobal matching
● Computing the disparity image
○ Interpolation
○ Subpixel enhancement
○ Refinement
Stereo Method● Cross-based cost aggregation: Collected only from pixels of the same
physical object
○ Support region for position p
○ Combined support region
Stereo Method● Cross-based cost aggregation: Collected only from pixels of the same
physical object
○ Averaged matching cost
Stereo Method● Semiglobal matching
○ Understand basic semiglobal matching
Stereo Method● Semiglobal matching
○ Energy function
Stereo Method● Semiglobal matching
○ Cost function in order to minimize E(D)
○ Choose P1 and P2
Stereo Method● Semiglobal matching
○ Final cost
○ Repeat cross-based cost aggregation
Stereo Method● Compute the disparity image
○ Interpolation
○ Subpixel enhancement
Stereo Method● Compute the Disparity Image
○ Refinement
■ 5x5 median filter
■ Bilateral filter
Experiments● Datasets: KITTI 2012, KITTI 2015, Middlebury
Experiments● Datasets: KITTI 2012, KITTI 2015, Middlebury
Experiments● Datasets: KITTI 2012, KITTI 2015, Middlebury
Experiments● Details of learning (skip)
● Dataset augmentation
Experiments● Runtime
Experiments● Comparison of approaches for
○ Computing matching cost
○ Stereo method
Experiments● Effects of dataset size
● Transfer learning Q: What is transfer learning?● Hyperparameters (skip)
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
● Learn similarity on pairs of patches to compute matching cost
● Two network used, for speed and accuracy separately
● Supervised way to train
● Output of the CNN is used to initialize the stereo matching cost
● A series of post-processing steps following……
Efficient Deep Learning for Stereo Matching
Introduction● Old methods use hand-crafted cost/energy functions
● Current CNN-based methods are very time-consuming
● The authors propose a new and faster network (similar to the Fast
Architecture in previous paper )
Network Architecture● Siamese network, remove ReLU from last layer● Use a product layer instead of another network
Training● Size of inputs
○ Left = receptive field size○ Right > receptive field size Q: Why?
● Size of outputs○ Left = 64○ Right = Q: Why?
● Softmax● Cross-entropy loss
Smoothing Deep Net Outputs● Cost aggregation
○ Simply performs average pooling over a window of size 5 x 5
● Semiglobal block matching○ Energy function
● Slanted plane (not very clear in the paper)● Sophisticated post-processing
○ In contrast to the “Compute the Disparity Image” part in previous paper, only use interpolation here, since the other two are found not indeedly improve performance
Experiments● Hyperparameters (skip)
● Datasets: KITTI 2012, KITTI 2015
Experiments● KITTI 2012
Q: How to explain?
Experiments● KITTI 2015
DiscussionThat two papers were almost concurrent work, how are they related?
And the strengthens and weakness for each of them when compared?
Thank you!