Accuracy in Real-Time Depth Maps
John MORRISCentre for Image Technology and Robotics (CITR)Computer Science/Electrical EngineeringUniversity of Auckland, New Zealandand전자전기공학부 ,중앙대학교 ,서울 Iolanthe II drifting off Waiheke Is
Outline
• Background• Motivation
• Problem• Collision Avoidance
• Accuracy• Parallel Axes case• Verging Axes• Optimizing
• Active Illumination• Algorithm Performance
• Stereo Algorithms• Which one is best?
Motivation
• Stereo Vision has many applications• Aerial Mapping• Forensics
• Crime Scenes
• Traffic Accidents
• Mining• Mine face measurement
• Civil Engineering• Structure monitoring
• General Photogrammetry• Non contact measurement
• Most of these are not time critical …
Motivation• Time-critical Applications
• Most existing applications are not time critical• Several would benefit from real-time feedback as data was
collected• Traffic accident scene assessment
• There’s pressure to clear the scene and let traffic continue• Investigators have to rely on experience while taking images
• Mining• Real-time feedback could direct machinery to follow a pre-
determined plan • …
and then there’s …• Collision avoidance
• Without real-time performance, it’s useless!
Motivation• Collision avoidance
• Why stereo?• RADAR keeps airplanes from colliding• SONAR
• Keeps soccer-playing robots from fouling each other
• Guides your automatic vacuum cleaner
• Active methods are fine for `sparse’ environments• Airplane density isn’t too large
• Only 5 robots / team
• Only one vacuum cleaner
Motivation
• Collision avoidance• What about Seoul (Bangkok, London, New York, …) traffic?
• How many vehicles can rely upon active methods?
• Reflected pulse is many dB below probe pulse!
• What fraction of other vehicles can use the same active method before even the most sophisticated detectors get confused?(and car insurance becomes unaffordable )
• Sonar, in particular, is subject to considerable environmental noise also
• Passive methods (sensor only) are the only ‘safe’ solution• In fact, with stereo, one technique for resolving problems may
be assisted by environmental noise!
Stereo Photogrammetry
Pairs of images giving different views of the scene
can be used to compute a depth (disparity) map
Key task – CorrespondenceLocate matching regions in both images
Epipolar constraintAlign images so that matches must appear in the same scan line in L & R images
Depth Maps
Computed: CensusGround Truth
Which is the better algorithm?
Computed: Pixel-to-Pixel
Vision Research tends to be rather visual !Tendency to publish images `proving’ efficacy, efficiency, etc
Performance and Accuracy
• I will use• Performance to describe the quality of matching
• For how many points was the distance computed correctly?
• Metrics
• % of good matches,
• Standard deviation of matching error distribution
• Function of the matching algorithm, image quality, etc
• Accuracy for precision of depth measurements• Assuming a pixel is matched correctly,
how accurate is the computed depth? or
• What is the resolution of depth measurements?
• Metric
• Error in depth - absolute or relative (% of measured depth)
• Function of stereo configuration and sensor resolution (pixel number and size)
Accuracy
• Traditional (film-based) stereophotogrammetry limited by film grain size• Small enough so that mechanical accuracy of the
measuring equipment became the limiting factor and
• Accuracy was determined by your $ budget• More $s -> higher resolution equipment
• Mapping
• Digital cameras • discrete (large but shrinking!) pixelssignificant accuracy considerations
Stereo Geometry
• How accurate are these depth maps?• In collision avoidance, we need to know the current
distance to an object and be able to derive our relative velocity
• Example:• An object’s image ‘has a disparity of 20 pixels’
= Its image in the R image is displaced by 20 pixels relative to the L image
Accuracy of its position?
• First approximation ~ 5% ( 1 / 20 )
• How do we obtain better accuracy?
Stereo Camera Configuration• Standard Case
Two cameras with parallel optical axesb baseline (camera separation) camera angular FoVDsens sensor widthn number of pixelsp pixel widthf focal lengtha object extentD distance to object
Stereo Camera Configuration
• Standard Case – Two cameras with parallel optical axes
• Rays are drawn through each pixel in the image
• Ray intersections represent points imaged onto the centre of each pixel
Points along these lineshave the same
LR displacement (disparity)
but• an object must fit into
the Common Field of View
• Clearly depth resolution increases as the object gets closer to the camera
• Distance, z = b f
p ddisparity
focal length
pixel size
Depth Accuracy – Canonical Configuration
0 1 2 3 4 5 6 7 8 9 10-0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
D2 (m)
D
2 (
m)
Asymptote
Best D2
• Given an object of an extent, a, there’s an optimum position for it!
• Assuming baseline, b, can be varied
• Common fallacy – just increase b to increase accuracy
Stereo Camera Configuration
• This result is easily understood if you consider an object of extent, a
• To be completely measured, it must lie in the Common Field of View
but
• place it as close to the camera as you can so that you can obtain the best accuracy, say at D
• Now increase b to increase the accuracy at D • But you must increase D so that the object stays
within the CFoV!• Detailed analysis leads to the previous curve and
an optimum value of b a
Points along these lineshave the same
LR displacement (disparity)
bD
a
Stereophotogrammetry vs Collision Avoidance• This result is more relevant
for stereo photogrammetry• You are trying to
accurately determine the geometry of some object
• It’s fragile, dangerous, …and you must use non-contact measurement
• For collision avoidance, you are more concerned with measuring the closest approach of an object (ie any point on the object!)
you can increase the baseline so that the critical point stays within the CFoV
Dcritica
l
Collision Avoidance• For collision avoidance, you
are more concerned with measuring the closest approach of an object (ie any point on the object!)
• you can increase the baseline so that the critical point stays within the CFoV
Dcritica
l
Increasing the baseline%
good
matc
hes
Baseline, b
Images: ‘corridor’ set (ray-traced)Matching algorithms: P2P, SAD
Increasing the baselinedecreases performance!!
Increasing the baselineS
tand
ard
Devia
tion
Examine the distribution of errors
Images: ‘corridor’ set (ray-traced)Matching algorithms: P2P, SAD
Increasing the baselinedecreases performance!!
Baseline, b
Increased Baseline Decreased Performance• Reasons
• Statistical• Higher disparity range
increased probability of matching incorrectly - you’ve simply got more choices!
• Perspective• Scene objects are not fronto-planar• Angled to camera axes
subtend different numbers of pixels in L and R images
• Scattering• Perfect scattering (Lambertian) surface assumption• OK at small angular differences
increasing failure at higher angles
• Occlusions• Number of hidden regions increases as angular difference increases
increasing number of ‘monocular’ points for which there is no 3D information!
Accuracy in Collision Avoidance
• Accuracy is important!• Your ability to calculate an optimum avoidance strategy
depends on an accurate measure of the collision velocity
• Luckily, accuracy does increase as an object approaches the critical region, but we’d still like to measure the collision velocity accurately at as large a distance as possible!
• For parallel camera axes,
D = f b / d
• where
d = xL - xR = n p
Nice, simple (if reciprocal) relationship!
D distancef focal lengthb baselined measured disparityxL|R position in L|R imagen number of pixelsp pixel size
Parallel Camera Axis Configuration• Accuracy depends on d - or
the difference in image position in L and R imagesandin a digital system, on the number of pixels in d
• Measurable regions also must lie in the CFoV
• This configuration is rather wasteful
• Observe how much of the image planes of the two cameras is wasted! Dcritica
l
Evolution
• Human eyes ‘verge’ on an object to estimate its distance, ie the eyes fix on the object in the field of view
Configuration commonlyused in stereo systems
Configuration discoveredby evolution millions of years
ago
Note immediately that the CFoV is much larger!
Nothing is free!
• Since the CFoV is much larger, more sensor pixels are being used and depth accuracy should increasebut
• Geometry is much more complicated!• Position on the image planes of a point at (x,z) in the
scene:
• Does the increased accuracy warrant the additional computational complexity?
xL = f/p tan( arctan((b+2x)/2z) - )
yL = f/p tan( arctan((b-2x)/2z) - ) vergence angle
Note: In real fixed systems,Computational complexity can be reduced,see the notes on real-time stereo!
Depth Accuracy
OK - better …but it’s not exactly spectacular!
Is it worth the additional computational load?
A minor improvement?
• What happened?
• As the cameras turn in,Dmin gets smaller!
• If Dmin is the critical distance,D < Dmin isn’t useful!
This area isnow wasted!
Look at the optical configuration!
• If we increase f, then Dmin returns to the critical value!
Original f Increase f
Depth Accuracy - Verging axes, increased f
Now the depth accuracyhas increased dramatically!
Note that at large f,the CFoV does not
extendvery far!
Increased focal length
• Lenses with large f• Thinner• Fewer aberrations
• Better images
• Cheaper?
• Alternatively, lower pixel resolution can be used to achieve better depth accuracy ...
Zero disparity matching
• With verging axes,at the fixation point, scene points appear with zero disparity (in the same place on both L and R images)
• If the fixation point is set at some sub-critical distance (eg an ‘early warning’ point), then matching algorithms can focus on a small range of disparities about 0
• With verging axes, both +ve and -ve disparities appearPotential for fast, high performance matching focussing on
this regionPossible research project!
This is similar to the way our vision system works:we focus on the area around the fixation point andhave a higher density of rods and cones in the centre of our retina
Locus for d = 0
Locus for d = +1
Locus for d = -1
Non-parallel axis geometry
• Points with the same disparity lie on circles now
• For parallel axes, they lie on straight lines
Verging axis geometry
• Points with the same disparity lie on Veith-Muller circles with the baseline as a chord
Zero disparity matching (ZDM)
• Using a fixation point in some critical regionintroduces the possibility of faster matching
• It can alleviate the statistical factor reducing matching quality• You search over a restricted disparity range• Several ‘pyramidal’ matching techniques have been
proposed (and success claimed!) for conventional parallel geometries
• These techniques could be adapted to ZDM
• Care:• It has no effect on the other three factors!
Why is stereo such a good candidate for dense collision avoidance applications?• One serious drawback
• It doesn’t work with textureless or featureless regions• There’s nothing for the matching algorithm to match!
• Active illumination• Impressing a textured pattern (basically any one will do!) on the
scene• Several groups (including ours!) have demonstrated that this is
effective - increasing matching performance significantly
• Real benefit• Environmental ‘noise’ (ambient light patterns) do not interfere!!• In fact, they may provide the texture needed to assist matching
Thus multiple vehicles impressing ‘eye-safe’ (near IR) patterns onto the environment should only help each other
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
-44:
-39:
-34:
-29:
-24:
-19:
-14:
-9:
-4:
1:
6:
11:
16:
21:
26:
31:
36:
41:
Metrics ( Introduce some science! )
From this distribution, we canderive several measures:
% of good matches (error ≤ 0.5)Histogram mean (bias)Histogram Std Dev (spread)
• Compute the distribution of differences between depth maps derived for an algorithm and the ground truth
Generally we have used the “% of good matches” metricMean and standard deviation are used as auxiliary metricsRunning time was also measured
Ray-Traced Images …
SNR = SNR = +36dB SNR = 0 dB
• The ‘Corridor’ set are synthetic (perfect) images • Generated by ray-tracing software
Possible to corrupt them with various levels of noise to test robustness
Algorithms Taxonomy
• Area-based• Match regions in both images, eg 99 windows surrounding a
pixelDense depth maps
• A depth assigned to every pixel
• Tend to have dataflow computation styles Most suitable for hardware implementation
• Feature-based• Look for features first, then attempt matches
• eg edge-detect, then match edges
Sparse depth maps• Less suitable for hardware implementation
• More branches in the logic
We concentrated onarea-based algorithmsOur original goal was hardware (FPGA) implementation
Some trials on simple feature-based matchingshowed no improvement over area-based algorithms
Algorithms
• Area-based• Correlation
• A window is moved along a scanline in the R image until the best match with a similar-sized window in the L image is found
• ‘Best match’ defined by various cost functions• Multiplicative correlation• Normalized squared differences• … (many other variations!)• Sum of absolute differences (SAD)
• Ignore occlusions (pixels visible in one image only)
• Dynamic• Attempt to find the best matching path through a region defined by
corresponding pixel in the R image – maximum disparity, • Can recognize occlusions
• … and many more (graph cut, pyramidal, optical flow, … )
Algorithms Evaluated• Area-based
• Correlation• 3 different cost functions
• Multiplicative correlation• Normalized squared differences• Sum of absolute differences (SAD)
• Census• Reduces pixel intensity differences to a single bit• Counts bit differencesClaimed suitable for hardware implementation
• Dynamic• Birchfield and Tomasi’s Pixel-to-Pixel chosen because it takes occlusions into
account
• Most others are too computationally expensive for real-time implementation• Even taking potential parallelism in hardware into account!• eg graph-cut (best results, but slow – >100s per image!)
Algorithm Details• Correlation Algorithm Cost Functions
• Corr1 – Normalized intensity difference
• Corr2 – Normalized multiplicative correlation
• SAD
• Census• Rank ordering of pixel intensities over an inner window forms a ‘census
vector’ (one bit / pixel in window)• Cost function is Hamming distance of these vectors• Summed over outer window
IL(x,y)-IR(x,y-)|C(x,y,) =
(IL(x,y)-IR(x,y-))2
C(x,y,) =IL(x,y)2 IR(x,y-)2
IL(x,y)IR(x,y-)C(x,y,) =
IL(x,y)2 IR(x,y-)2
Typical set of experiments
• Census algorithm• Two operational parameters
– length of the census vector, ie size of the window over which a rank (ordering) transform is performed
– size of correlation window
• Trials were run for all reasonable combinations of the two parameters on all 6 test images
• + one additional aerial photograph pair from IGN, Paris
• These trials locate optimal values of the algorithm parameters• w, window ‘radius’ for simple correlation algorithms
• () for Census
• (match reward, occlusion) for Pixel-to-pixel
Census Good Matches – Corridor
0 1 2 3 4 5 6 71
50%
10%
20%
30%
40%
50%
60%
70%
Corridor
60%-70%
50%-60%
40%-50%
30%-40%
20%-30%
10%-20%
0%-10%
Good match %
β
= 4 β = 3
Census Good Matches – All images
0 1 2 3 4 5 6 71
50%
10%
20%
30%
40%
50%
60%
70%
Corridor
60%-70%
50%-60%
40%-50%
30%-40%
20%-30%
10%-20%
0%-10%
0 1 2 3 4 5 6 71
50%
5%
10%
15%
20%
25%
30%
35%
Madroom
30%-35%
25%-30%
20%-25%
15%-20%
10%-15%
5%-10%
0%-5%
0 1 2 3 4 5 6 71
50%
20%
40%
60%
80%
100%
Map
80%-100%
60%-80%
40%-60%
20%-40%
0%-20%
Good match %
β
0 1 2 3 4 5 6 71
50%
20%
40%
60%
80%
100%
Sawtooth
80%-100%
60%-80%
40%-60%
20%-40%
0%-20%
0 1 2 3 4 5 6 71
50%
10%
20%
30%
40%
50%
60%
70%
Tsukuba
60%-70%
50%-60%
40%-50%
30%-40%
20%-30%
10%-20%
0%-10%
0 1 2 3 4 5 6 71
50%
20%
40%
60%
80%
100%
Venus
80%-100%
60%-80%
40%-60%
20%-40%
0%-20%
= 4 β = 3
is close to best for
for all images
Census – Corridor - Metrics
0 1 2 3 4 5 6 71
50
1
2
3
4
5
6
Std. Dev.
5-6
4-5
3-4
2-3
1-2
0-1
0 1 2 3 4 5 6 71
5-8-7-6-5
-4-3
-2
-1
0
Mean
-1-0
-2--1
-3--2
-4--3
-5--4
-6--5
-7--6
-8--7
Approaches 0as expected
for larger windows
• Becomes smaller for larger windows
• Narrower error peakcentred on zero
• Matching really isimproving!
Pixel-to-Pixel
• Birchfield and Tomasi• ‘Dynamic’ algorithm• Attempts to find the best matching ‘path’• Cost function
(M) = Nocc occ – Nm r + dissimilarity
• Variable parameters occ – Occlusion penalty r – Matching reward
• Dissimilarity• Usually | IL – IR |
• Other variations possibleSub-pixel matching, etc
Number of matches
Number of Occlusions
Pixel-to-Pixel Results'5
'25
'45
'65
'85
'105
'125
'145
'2
'240%
10%
20%
30%
40%
50%
60%
70%
P2P - Corridor - Zero Error
60%-70%
50%-60%
40%-50%
30%-40%
20%-30%
10%-20%
0%-10%
'5
'25
'45
'65
'85
'105
'125
'145
'2
'240%
10%
20%
30%
40%
50%
60%
P2P - Madroom - Zero Error
50%-60%
40%-50%
30%-40%
20%-30%
10%-20%
0%-10%
'5
'25
'45
'65
'85
'105
'125
'145
'2
'240%
20%
40%
60%
80%
100%
P2P - Map - Zero Error
80%-100%
60%-80%
40%-60%
20%-40%
0%-20%
Good match %
r
occ
'5
'30
'55
'80
'105
'130
'2
'16
'3080%
82%
84%
86%
88%
90%
P2P - Sawtooth - Zero Error
88%-90%
86%-88%
84%-86%
82%-84%
80%-82%
'5
'30
'55
'80
'105
'130
'2
'2662%64%66%68%70%72%74%76%78%80%
P2P - Tsukuba - Zero Error
78%-80%
76%-78%
74%-76%
72%-74%
70%-72%
68%-70%
66%-68%
64%-66%
62%-64%
'5
'25
'45
'65
'85
'105
'125
'145
'2
'240%
20%
40%
60%
80%
100%
P2P - Venus - Zero Error
80%-100%
60%-80%
40%-60%
20%-40%
0%-20%
produces goodresults
for all images
occ = 30 r = 8
Corridor
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1 2 3 4 5 6 7 8 9 10
Corr1
Corr2
SAD
Correlation Results
Good Match %
window radius (2r+1)*(2r+1) window
Optimum, r ~ 4 (99 window)
Compare algorithms
• Measure the performance!
% correct matches
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Corridor Madroom Map Sawtooth Tsukuba Venus
Census - Zero Error
Census 2 - Zero Error
SAD
Correlation
SAD performs as wellas the others
over a range ofimages!
Large matching windows are better ...• 6 sets of images
Corridor
0%
10%
20%
30%
40%
50%
60%
70%
'1 '2 '3 '4 '5 '6 '7 '8 '9 '10
Correlation - Corr1
Correlation - SAD
Madroom
0%
5%
10%
15%
20%
25%
30%
35%
40%
1 2 3 4 5 6 7 8 9 10
Correlation - Corr1
Correlation - SAD
Map
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1 2 3 4 5 6 7 8 9 10
Correlation - Corr1
Correlation - SAD
Sawtooth
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1 2 3 4 5 6 7 8 9 10
Correlation - Corr1
Correlation - SAD
Tsukuba
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 2 3 4 5 6 7 8 9 10
Correlation - Corr1
Correlation - SAD
Venus
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8 9 10
Correlation - Corr1
Correlation - SAD
% correctmatch
Window‘radius’
Comparisons – Good Matches
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Corridor Madroom Map Sawtooth Tsukuba Venus
Census (4, 3)
Census (7,5)
Corr1 (4)
Corr1 (10)
Corr2 (4)
Corr2 (10)
SAD (4)
SAD (10)
P2P (5, 6)
Using best parameters for ‘Corridor’
Most goodmatches
LowestStd. Dev.
Running Time
Corridor
58.4
360.7
3.8 19.43.5 17.2 4.1
21.02.2
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
Census (4,3)
Census(7,5)
Corr1 (4) Corr1 (10) Corr2 (4) Corr2 (10) SAD (4) SAD (10) P2P (5, 6)
2.4 2.2
Effect of Noise on Matching Quality
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
Infd
B
60dB
57dB
54dB
51dB
48dB
45dB
42dB
39dB
36dB
33dB
30dB
27dB
24dB
21dB
18dB
15dB
12dB
9dB
6dB
3dB
0dB
-3dB
-6dB
-9dB
-12dB
-15dB
Census (4, 3)
Census (7, 5)
Corr1 (4)
Corr1 (10)
Corr2 (4)
Corr2 (10)
SAD (4)
SAD (10)
P2P (5, 6)
P2P(5,5)
SAD(4)
Which algorithm?
• Dynamic algorithms (Pixel-to-Pixel) perform best• Better matching in most tests• Detect occlusions• Run faster
• Sum of absolute differences (SAD) is almost as good• For hardware implementation, it’s
• Simple• Mainly adders or subtractors
• Regular• Space efficient
• Parallel• Each possible disparity can be evaluated at the same time
• We have built VHDL models and demonstrated that practical systems will fit onto modern FPGAs and run at 30fps using the SAD algorithm
Conclusions
• Verging camera configurations provide• better accuracy
• but change f also to get the best results!
• potential for faster / better matching
• Active illumination solves a key matching problemand is not sensitive to environmental noise
• For hardware implementation,dynamic programming works well
Iolanthe II waiting in Whangareifor the Whangarei-Vanuatu race start
June, 2007
Top Related