Distance Determination from Pairs of Images ... - TU...

MSc Project Report HSM 1805

Hantao LIU

Distance Determination from Pairs of Images

from Low Cost Cameras

August 2005

The University of Edinburgh

School of Engineering and Electronics

MSc in Signal Processing and Communication

The King’s Buildings

Edinburgh EH9 3JL

i

MSc Project Mission Statement

Student: Liu Hantao

Supervisor: Dr John Hannah

Project Title: “Distance Determination from pairs of images from low cost cameras”

Project Definition:

This project will involve working with pairs of images from stereo cameras and using

these to determination depth information. Initially the work would be based on using

existing image pairs. The project may extend to using and developing software for

stereo camera hardware we have recently acquired.

Preparatory Tasks:

Read BEng project report

Revision/learning of C or C++ syntax

Look at IceCam stereo vision system at

http://www.icerobotics.co.uk/technology.html

Read literature on depth estimation from stereo

Main Tasks:

Develop a suitable algorithm for depth estimation

Produce C or C++ code implementation

Test software with example images

Acquire new images using IceCam stereo camera hardware

Write project report

Scope for Extension:

Develop ‘real-time’operation with IceCam system

http://www.icerobotics.co.uk/technology.html

ii

Background Knowledge:

C(C++) Programming

Signal Processing

Image Processing

Resources:

Sample image pairs

‘Vision’image processing library

Unix computer access (TLC)

Linux PC for hardware experiments

IceCam stereo vision hardware & software

Location:

TLC

Vision Lab for experimental work

Reference:

Umesh R. Dhond and J. K. Aggarwal. “Structure from stereo---a review.”IEEE

Transactions on Systems, Man, and Cybernetics, 19(6): 1489— 1510, 1989.

The supervisor and student are satisfied that this project is suitable for performance

and assessment in accordance with the guidelines of the course documentation.

Signed

Liu Hantao … … … … … … … … … … … … … .

Dr J M Hannah … … … … … … … … … … … … … .

Date: … … … … … … … … … … … … … .

iii

Abstract

This thesis presents an implementation of a stereo vision system, using image

processing techniques to determine depth information from pairs of stereo images.

This system, which aims to aid vehicle reversal, involves working with stereo images

of different vehicles, and the distance between the stereo cameras and the target

vehicle is determined. For a pair of stereo images, the system algorithm extracts

feature points from each image, and then a block matching technique is used to find

the corresponding points between the two images and calculate the displacement. The

experimental results are presented in this thesis as well. An introduction to stereo

vision and some widely used image processing techniques is given. Additionally a

discussion on the improvement and modification of the system is made. Conclusions

are presented regarding the success of the proposed system algorithms and the

possible future research work.

iv

Declaration of Originality

I declare that this thesis is my original work except where stated.

Hantao Liu

August 2005

v

Table of Contents

MSC PROJECT MISSION STATEMENT .......................................................................................... I

ABSTRACT ......................................................................................................................................... III

DECLARATION OF ORIGINALITY .............................................................................................. IV

TABLE OF CONTENTS ......................................................................................................................V

ABBREVIATIONS........................................................................................................................... VIII

CHAPTER 1. INTRODUCTION.................................................................................................1

1.1. AIMS AND OBJECTIVES...........................................................................................................1

1.2. IMPLEMENTATION ...................................................................................................................1

1.2.1. Preprocessing ...................................................................................................................2

1.2.2. Establishment of Correspondence ....................................................................................2

1.2.3. Depth Estimation..............................................................................................................3

1.3. THESIS PLAN ..........................................................................................................................3

CHAPTER 2. BACKGROUND AND THEORY........................................................................4

2.1. STEREO VISION ......................................................................................................................4

2.1.1. Extraction of Feature Characteristics ..............................................................................5

2.1.2. Stereo Correspondence Problem.......................................................................................5

2.1.3. Depth Information ............................................................................................................6

2.2. IMAGE PROCESSING TECHNIQUES ..........................................................................................7

2.2.1. Smoothing Filters .............................................................................................................7

2.2.2. Thresholding.....................................................................................................................8

2.2.3. Edge Detection .................................................................................................................8

2.2.4. Corner Detection............................................................................................................10

2.3. SOFTWARE DEVELOPMENT...................................................................................................10

CHAPTER 3. DESIGN AND EXPERIMENTS........................................................................ 11

vi

3.1. EDGE DETECTION BASED ALGORITHM................................................................................. 11

3.1.1. Gaussian Smoothing.......................................................................................................12

3.1.2. Vertical Edge Detection..................................................................................................12

3.1.3. Otsu’s Thresholding........................................................................................................13

3.1.4. Morphological Operation...............................................................................................14

3.1.5. Hough Transform............................................................................................................15

3.1.6. Feature Points Extraction...............................................................................................16

3.1.7. Summary .........................................................................................................................17

3.2. CORNER DETECTION BASED ALGORITHM ............................................................................18

3.2.1. System Overview.............................................................................................................18

3.2.2. Corner Detection............................................................................................................19

3.2.3. Scale Operation..............................................................................................................21

3.2.4. Thresholding...................................................................................................................22

3.2.5. Non-maximal Suppression ..............................................................................................22

3.2.6. Feature Points Extraction...............................................................................................23

3.2.7. Summary .........................................................................................................................23

3.3. STEREO MATCHING ..............................................................................................................24

3.3.1. Block Matching Algorithm..............................................................................................24

3.3.2. Experiments ....................................................................................................................24

3.3.3. Summary .........................................................................................................................27

3.4. DISTANCE DETERMINATION .................................................................................................28

CHAPTER 4. RESULTS.............................................................................................................30

4.1. EDGE DETECTION BASED ALGORITHM.................................................................................30

4.1.1. Extraction of Feature Points...........................................................................................30

4.1.2. Block Matching...............................................................................................................31

4.1.3. Matching Results of Initial Database .............................................................................31

4.1.4. Matching Results of Additional Database ......................................................................39

4.2. CORNER DETECTION BASED ALGORITHM ............................................................................52

4.2.1. Extraction of Feature Points...........................................................................................52

vii

4.2.2. Block Matching...............................................................................................................52

4.2.3. Matching Results of Initial Database .............................................................................53

4.2.4. Matching Results of Additional Database ......................................................................60

CHAPTER 5. DISCUSSION......................................................................................................71

5.1. EXTRACTION OF EDGE-BASED FEATURE POINTS ..................................................................71

5.2. EXTRACTION OF CORNER-BASED FEATURE POINTS..............................................................71

5.3. MATCHING ALGORITHM .......................................................................................................72

5.4. ADDITIONALPROBLEMS.......................................................................................................73

CHAPTER 6. CONCLUSIONS .................................................................................................74

ACKNOWLEDGEMENTS .................................................................................................................76

REFERENCES.....................................................................................................................................77

APPENDIX 1. INITIAL PROJECT IMAGES......................................................................... A.1

A.1.1. FRONT ................................................................................................................................ A.1

A.1.2. REAR .................................................................................................................................. A.2

APPENDIX 2. ADDITIONAL PROJECT IMAGES .............................................................. A.3

A.2.1. STRAIGHT CAR ...................................................................................................................A.3

A.2.2. ANGLED CAR ..................................................................................................................... A.3

A.2.3. TAXI ................................................................................................................................... A.4

A.2.4. FOREIGN VAN ..................................................................................................................... A.5

A.2.5. LANDROVER....................................................................................................................... A.5

A.2.6. WHITE CAR ........................................................................................................................ A.6

viii

Abbreviations

1-D One-Dimensional

2-D Two-Dimensional

3-D Three-Dimensional

LOG Laplacian of Gaussian

HT Hough Transform

MAD Mean Absolute Difference

MSD Mean Squared Distance

NCC Normalized Cross-Correlation

PGM Portable Grey Map

IEEE Institute of Electrical and Electronic Engineers

1

Chapter 1. Introduction

1.1. Aims and Objectives

Many image processing applications involve detecting target object and estimating

some meaningful parameters of that object such as velocity or distance. The aim of

this project is to work with pairs of images from stereo cameras and use these to

determine depth information. In stereo vision system, two cameras are separated by a

fixed horizontal distance, and correspondence is established between two stereo

images, by knowing the camera focal length and imaging geometry, the depth

information can be determined. The main work of this project involves the

development of a suitable algorithm to estimate stereo depth.

In this project, the stereo image pairs are sourced from fixed cameras located at a

known distance apart, and this stereo rig is mounted on a vehicle to aid reversal. By

evaluation of all available information from stereo, corresponding points on the two

images will then be found, and the distance between the reversing vehicle and a

stationary vehicle at the rear of it can be determined.

1.2. Implementation

To implement the system, stereo image pairs in Appendix 1 and Appendix 2 are used.

Appendix 1 contains front and rear images of a car. And Appendix 2 consists of

images taken from different vehicles and a vehicle in different situations such as

straight or angled. These images are used to test the robustness of the system.

The implementation of this system can be divided into three major steps:

preprocessing, establishing correspondence, and estimating depth. This is shown in

Figure 1.1. In this section we briefly describe each of them.

2

Figure 1.1 Flow Diagram of System Implementation

1.2.1. Preprocessing

Preprocessing of images is an important step for stereo computation. In this stage,

feature characteristics are extracted from each image, and they are extensively used in

the subsequent matching process, therefore these feature characteristics have to be

chosen carefully. In this project, feature points of certain number are obtained in each

image as the matching primitives. There are two proposed algorithms used to extract

feature points in this project, one is based on edge detection combined with Hough

transform (HT), and the other is based on corner detection.

1.2.2. Establishment of Correspondence

Matching is perhaps the most important stage in stereo image processing. As we have

extracted feature points from two stereo images, correspondence needs to be achieved

among these homologous feature characteristics, that is, we have to find the feature

points that are projections of the same physical identity in each image. In this project,

block matching is applied to find the corresponding points in stereo vision.

3

1.2.3. Depth Estimation

In this stage, corresponding points on the two stereo images are used to calculate the

disparity – the separation between matched pixels. The depth information is then

determined by the consideration of imaging geometry and camera focal length.

1.3. Thesis Plan

The goal of this thesis is to design a suitable algorithm for depth estimation and

produce software for implementation.

This thesis will describe the techniques used and algorithms proposed in the project,

and the results achieved by the system will be analyzed and compared. Finally, some

ideas are put forward for future research.

Chapter two briefly describes the background of this project and some basic theories

that are used for stereo image processing.

Chapter three contains the system design and experiments. As two algorithms are

proposed in this project for feature characteristics extraction, this chapter is divided

into two sections for edge detection based approach and corner detection based

approach respectively.

Chapter four shows some experimental results, analysis and comparisons.

Chapter five includes a discussion on the problems encountered in this project, and

Chapter six gives conclusions and proposes possible future work.

4

Chapter 2. Background and Theory

2.1. Stereo Vision

Analysis of video images in stereo has emerged as an important passive method for

extracting the three-dimensional (3-D) structure of a scene [4]. A simplified stereo

imaging system is shown in Figure 2.1.

'' , ll yx '' , rr yx

zyx ,,

Figure 2.1 A Simplified Stereo Imaging System [17]

Two cameras with their optical axes parallel and separated by a distance d [17].

The line connecting the camera lens centers is called the baseline [17].

The focal length of both cameras is f [17].

Let the origin O of this system be mid-way between the lens centers [17].

Let the x axis of the 3-D world coordinate system be parallel to the baseline [17].

Consider a point zyx ,, in 3-D world coordinates on an object [17].

Let the point zyx ,, have image coordinates '' , ll yx and '' , rr yx in the left

and right image planes of the respective cameras [17].

The goal of stereo vision research is to estimate depth information from a pair of

stereo images. With two cameras separated by a fixed distance, each camera receives

5

a slightly different image of the same scene in the real world. If we can successfully

determine which feature characteristics in the image form the left camera correspond

with which in the image from the right camera, and if we know the stereo imaging

geometry and camera focal length, it is possible to reconstruct the depth information.

Generally, the major stages involved in the stereo vision are preprocessing of images

to obtain matching features, recovering the disparity between the images by a suitable

stereo algorithm, and using geometry to recover the stereo depth.

2.1.1. Extraction of Feature Characteristics

Extraction of feature characteristics from an image for subsequent matching process is

an important step in stereo vision. In this stage, we have to carefully decide which

kind of features should be chosen as the matching primitives, and this will have big

influence on the results of stereo matching. The feature characteristics can be

classified into two categories: area-based and feature-based.

The area-based matching primitives are used in some of the early stereo algorithms.

Area patches from two images are matched to establish correspondence.

The feature-based matching schemes, which match features directly, have been

increasingly used in practice. Since physical discontinuities in a scene are mapped to

intensity changes in an image, edges are widely used as the matching primitives.

2.1.2. Stereo Correspondence Problem

The stereo correspondence problem which finds corresponding points between two

images is the crucial and most difficult stage in stereo vision. The main task is to

compute the accurate disparity between the left and right images. In the past decades,

a large number of stereo matching algorithms have been proposed, and these

strategies are different according to the matching primitives as well as the stereo

imaging geometry. In terms of the matching primitives, the area-based matching and

6

the feature-based matching are commonly used. There are also different imaging

geometries, such as parallel-axis and nonparallel-axis, binocular and multi-ocular [4].

Area-based stereo techniques use correlation among brightness (intensity) patterns in

the local neighborhood of a pixel in one image with brightness patterns in a

corresponding neighborhood of a pixel in the other image [4]. This is a simple

matching method, but it is sensitive to changes of overall illumination or perspective.

Also, the selections of interest points and similarity measurement have large influence

on the determination of the accurate depth information.

Feature-based stereo techniques use symbolic features derived from intensity images

rather than image intensities themselves [4]. The advantage of this matching approach

is that it is more stable to changes in contrast and illumination, because the

feature-based techniques do not use intensity values directly. In practice, edge points

or edge segments are commonly used as the features.

Stereo matching paradigms are also characterized by the particular imaging geometry

being used [4]. The conventional stereo imaging geometry contains two cameras with

their optical axes mutually parallel, and the factors that could be changed include, but

are not limited to, the mutual orientation of the optical axes of the cameras and the

number of cameras used [4].

2.1.3. Depth Information

The term depth and the term disparity are frequently used in the literature of stereo

vision. Although they are interchangeable in many cases, there is a subtle difference

in meaning [6]. When a point in the left image and another point in the right image are

matched, that is, they are considered to be projections of the same physical identity in

the 3-D world, the difference in their relative positions is recorded as the disparity.

The Equation 2.1 shows the relationship between disparity (in pixels) and depth.

7

DisparityhFocalLengtBaselineDepth [6] (2.1)

The Equation 2.1 can be obtained from the imaging geometry shown in Figure 2.1. By

considering similar triangles:

z

dx

fx l 2

' ,

z

dx

fxr 2

' ,

zy

fy

fy rl

''

[17] (2.2)

Solving for (x, y, z) gives:

''

''

2 rl

rl

xxxxd

x

, ''

''

2 rl

rl

xxyyd

y

, ''rl xx

dfz

[17] (2.3)

The quantity ''rl xx which appears in Equation 2.2 and 2.3 is called the disparity,

and the quantity z is called the depth.

2.2. Image Processing Techniques

There are many image processing techniques which have been widely used in stereo

vision to enhance and manipulate images. Some of the techniques used in this project

are described in this section.

2.2.1. Smoothing Filters

Smoothing filtering aims to reduce noise in an image, and it is usually used in

preprocessing step to remove small details prior to object extraction. These filters are

also called averaging filters, because the output of a smoothing filer is the average of

the pixels contained in the filter mask.

The basic idea of smoothing is to simply replace the intensity value of every pixel in

an image by the average intensity value of all pixels within the defined filter mask,

8

and this process can reduce sharp changes in gray levels. Noise can be considered as

high-frequency information in an image, smoothing is essentially a low-pass filter.

However, edges which are desirable high-frequency elements can be removed or

blurred by a smoothing filter. Therefore, the size and property of filter used in practice

have to be chosen carefully, that is, a trade-off should be

e to remove more unwanted information and retain enough desired image features.

Figure 2.2 shows two 3×3 smoothing filters.

91

161

Figure 2.2 3×3 Smoothing Filers

2.2.2. Thresholding

Image thresholding is one of the most commonly used techniques in image processing

applications due to its simplicity of implementation. This operation highlights pixels

which have particular intensity values, or intensity values within a specified range [3].

Choosing a suitable threshold level is the most difficult part in the thresholding

operation, and it also depends on the application requirements. In uniform

thresholding, pixels above the chosen brightness level are set to white, and those

below this level are set to black. And the adaptive thresholding divides the original

image into subimages and then applies a different threshold to each subimage [1].

There are more advanced techniques which can select an optimal threshold level

automatically based on the image histogram, such as the Otsu’s method [3].

2.2.3. Edge Detection

Since feature-based stereo algorithms have been increasingly applied in many systems,

9

edges detection becomes one of the most commonly used stereo vision techniques. An

edge is a set of connected pixels that lie on the boundary between two regions [1].

Since the edge point is at the position of a step-change in gray level, or it is a

high-frequency element in frequency domain, edge detection highlights contrast and

is robust against brightness changes on an image. There are many excellent edge

detection operators, such as Prewitt, Sobel, Canny, and Marr-Hildreth. These

operators can be classified into two categories: first-order edge detection and

second-order edge detection.

The Sobel edge detection is a first-order edge detection and among the most used in

practice. The Sobel operator gives a better performance than other contemporaneous

edge detection operators, and it has superior noise-suppression characteristics [1]. The

Sobel operator consists of a vertical template, Mx, and a horizontal template, My,

which are given in Figure 2.3 (a) and (b) respectively.

Figure 2.3 Sobel Templates

The Marr-Hildreth edge detection is one of the most famous second-order edge

detections. Hildreth and Marr proposed that using a Laplacian of Gaussian (LOG)

operator can obtain a near-optimal edge detection operator [6]. The basic idea of

Marr-Hildreth operator is to combine the Gaussian smoothing with second-order

differentiation, and then detect edges via zero-crossings.

Edge detection detects intensity changes, and it is high-pass filter in frequency

domain, therefore it responds to noise. In practical applications, a trade-off has to be

considered, because some edge operators may detect more edges but respond to noise,

10

and others my be noise-tolerant but remove significant edge information.

2.2.4. Corner Detection

Corners which can be considered as junctions of edges are another low-level features,

and these again can be extracted automatically from an image. Corners are the points

of interest, and they are derived from edge information which defines the boundary of

different objects or different parts of the same object. A large number of corner

detection algorithms have been developed in the past decades. However, there are

three main trends for detection of corners in gray scale image: edge-relation methods,

topology methods, and autocorrelation methods [18].

2.3. Software Development

This project is encoded on a UNIX system by using the C++ programming language

and the CMACS compiler. There are two main reasons for using C++ programming

language in this project: one is to meet the real-time requirement of the system, and

the other is to use the ‘Vision Systems library’. This library [10] is proposed by the

Vision Systems group at the University of Edinburgh, and it contains the most

commonly used classes in the Vision Systems code.

Some classes in the ‘Vision System library’are particularly designed for image

processing, and the most used of these in this project are the VS_frame and

VS_frame_io. The VS_frame class is used to store an image, and pixels are stored as

integer (int) values [10]. This class provides some useful operations for returning

image attributes, getting intensity values, and setting new intensity values for pixels.

The VS_frame_io class is used for reading and writing images to and from files [10].

11

Chapter 3. Design and Experiments

3.1. Edge Detection Based Algorithm

Edge detection, which highlights meaningful discontinuities in grey levels, is one of

the most popular approaches for extracting features of an image. Since edges often

occur at the boundaries of features within an image, edge detection is used to separate

the object from its surroundings. Interpreting an image based on edges can reduce the

amount of data while retaining most of the image information. Moreover, edge

detection is insensitive to overall illumination changes, and it is thereby an important

component of preprocessing of stereo images for further use. The algorithm proposed

in the project combines edge detection with Hough transform (HT) to extract

reference points from stereo pairs of images for correspondence establishment.

The basic idea is shown in the flow diagram of Figure 3.1.

OriginalImage(Left)

DistanceDetermination

GaussianSmoothing

Vertical EdgeDetection

Otsu’sThresholding

Thinning &Erosion

HoughTransform

Vertical LinesExtraction

Feature PointsExtraction

OriginalImage(Right)

GaussianSmoothing

Vertical EdgeDetection

Otsu’sThresholding

Thinning &Erosion

HoughTransform

Vertical LinesExtraction

Feature PointsExtraction

BlockMatching

Figure 3.1 Flow Diagram of Edge Detection Based Algorithm

12

3.1.1. Gaussian Smoothing

Averaging is used to reduce the noise before edge detection. The Gaussian averaging

has been considered to be the optimal smoothing for an image [3]. The values of

Gaussian template are set by the 2-D Gaussian relationship given in Equation 3.1.

2

22

222

1,

yx

eyxG

[16] (3.1)

where is the standard deviation of the Gaussian distribution.

In this project, a 5×5 Gaussian template in Figure 3.2 with a of 1.0 is used.

Gaussian smoothing can offer better performance compared with direct averaging:

more image features are retained while the noise is removed.

Figure 3.2 5×5 Gaussian Template [16]

3.1.2. Vertical Edge Detection

In this project, it is noticed that vehicles have more horizontal edges than vertical

edges. According to our experimental results, the edge images have very strong

vertical edges, and it is clear that vertical edge detection is better than horizontal edge

detection in suppressing noise [15]. Therefore, it is reasonable to implement vertical

edge detection on vehicle stereo images for feature extraction.

There are many edge detection techniques, and the Sobel edge detection is so far

among the most used in practice, therefore it is chosen to be applied in this project.

The template used for vertical edge detection is given in Figure 3.3.

13

1

2

1

-1

-2

-1

0

0

0

Figure 3.3 Template of Sobel Vertical Edge Detection

3.1.3. Otsu’s Thresholding

Thresholding is a simple feature extraction technique. Edge images are converted to

binary images by thresholding. Choosing the threshold level is difficult as it requires

knowledge of the grey level. In this project, we use Otsu’s method, which is an

optimal thresholding technique. Otsu’s technique can automatically select a threshold

level that achieves the best separation of an object from its background. The basis for

this is use of the normalized histogram which represents a probability distribution for

the intensity levels as [3]:

2NlN

lp [3] (3.2)

The zero-order and first-order cumulative moments of the normalized histogram [3]:

k

l

lpk1

and

k

l

lplk1

[3] (3.3)

The total mean level of the image [3]:

max

1

N

l

lplT [3] (3.4)

The variance of the class separability is the ratio [3]:

kk

kkTkB

1

22

max,1 Nk [3] (3.5)

14

The optimal threshold optT is the level for which the variance is at its maximum [3]:

kT BNkoptB2

1

2

max

max

[3] (3.6)

Since selecting the threshold level by Otsu’s method is automatic, as opposed to

manual this has advantage in automated stereo vision.

3.1.4. Morphological Operation

(1) Thinning

The edges in the output image of edge detection are always thick. In order to

implement the Hough Transform (HT) to detect different lines, a thinning technique is

used to thin these edges by reducing all lines to a single pixel thickness. The thinning

operation is based on a structuring element, and it is determined by translating the

origin of the structuring element to each possible position in the image, and

comparing with the underlying image pixels [16]. If the pixels in the structuring

element exactly match pixels in the image, then pixel in the image which is

underneath the origin of the structuring element is set to zero [16]. Otherwise it is left

unchanged [16]. In each iteration, each structuring element must be used in each of its

four 90° rotations.

In this project, two structuring elements are used, which are given in Figure 3.4 (a)

and (b) respectively.

Figure 3.4 Structuring Elements of Thinning [1]

15

(2) Erosion

The erosion operation is applied on an edge image to erode away the boundaries of

edges, thus edges shrink in size. In the vertical edge image, the implementation of

erosion technique can remove short vertical edges which are considered to be noise,

and retain the strong vertical edges. In terms of Hough transform, this operation can

reduce the computational complexity. In the project, a 1×5 structuring element is used

to erode the edge image. The structuring element is superimposed on top of the input

image and the origin of the structuring is centered on each possible position in the

image [16]. If all the pixels underneath the structuring element are 255 (white), then

the input pixel in the image is set to 255 (white); otherwise if any of the

corresponding pixels in the image are not 255, the input pixel is set to 0 (black) [16].

The 1×5 erosion structuring element is given in Figure 3.5.

Figure 3.5 Structuring Element of Erosion

3.1.5. Hough Transform

There are many vertical edges remaining in the output image of above processing. We

can also see that the vehicle number plate contains more vertical edges than other

regions in the image. It is reasonable to extract an equal number of strong vertical

edges from each of the two stereo images, and then obtain points from these candidate

lines as matching primitives. The Hough transform is used in the project to extract

these candidate vertical lines. The advantage of the Hough transform technique is that

it is relatively unaffected by noise. Any line on the x-y plane is shown in Figure 3.6.

16

x

y

Figure 3.6 Polar Consideration of a Line [3]

We can describe a set of lines in the form:

sincos yx [3] (3.7)

where is the angle of the line normal to the line in an image and is the length

of the normal from the origin to the line.

The accumulator array is a set of 180 bins, the value of is in the range 0 to 180°,

and the value of is in the range 0 to 22 MN , where N×M is the image size.

The peaks in the accumulator array are projections of straight lines in the edge image.

In this project, only vertical edges are detected, accordingly in the accumulator array

of Hough transform, only the row with =0 is retained. Then peaks within this range,

which represent ten straight lines in an image, are extracted. Therefore ten vertical

lines are detected in each image, and the edge points are only retained in the positions

where the detected lines are placed.

3.1.6. Feature Points Extraction

So far, we have extracted the candidate lines, which are strong vertical edges in an

image. It is proposed in the project that the top and bottom points of each candidate

vertical edge are chosen as the feature points. There are also two other important

issues to be considered.

17

(1) Post-processing of the Candidate Vertical Lines

There may be more than one vertical edge in the same detected vertical position. And

the longest vertical edge is chosen to be the real candidate for extracting the feature

points. Moreover, there may be also gaps or discontinuous points within the candidate

vertical edge. This can be solved by using the morphological operations.

(2) Points Extraction

For each of these candidate vertical edges, its top and bottom points are extracted as

the feature points for establishing correspondence, meanwhile, the coordinates of

these points are known. Therefore, twenty points are obtained from the ten candidate

vertical edges. In order to make the results more accurate, we can detect more vertical

edges by the Hough transform, and more feature points can then be acquired from the

input image.

3.1.7. Summary

In the proposed method, we first detect the vertical edges from an image, and then we

extract the strong vertical edges by the Hough transform. Equal number of vertical

edges can be picked out from a pair of stereo images. The top and bottom points of

each candidate vertical edges are obtained as the feature points to find correspondence.

In the method, vertical edges are detected as they are relatively strong in the car image.

Hough transform guarantees that the detected vertical edges are strong enough, which

increases the possibility that the edges extract from a pair of images are corresponding

features.

Choosing a relatively small number of points for correspondence matching can reduce

the computational cost. In the project, we detect only ten candidate vertical edges and

twenty feature points are extracted. More feature points can be obtained to increase

the accuracy, but consequently this requires more computational effort.

18

3.2. Corner Detection Based Algorithm

The use of interest points to find the correspondence between two stereo images can

drastically reduce the required computation time compared with processing every

pixel in the two images or pixels within certain regions of images. The proposed

method based on vertical edge detection combined with the Hough transform is one of

the applications of interest points. Corner detection is an alternative method to extract

interest points for finding correspondence.

3.2.1. System Overview

Corners are essentially the points where the edge direction changes rapidly in an

image. In the project, a vehicle image contains many strong horizontal and vertical

edges. Therefore, it is reasonable to extract corners as the interest points. It is noticed

that no optimal corner detector is available, and selecting a corner detector depends on

the particular application (i.e. real-time). The proposed stereo vision system which is

based on corner detection is shown in Figure 3.7.

Figure 3.7 Flow Diagram of Corner Detection Based Algorithm

19

3.2.2. Corner Detection

Corner detection is applied to an image to obtain a cornerness map. For each pixel in

an image, the corner operator is implemented to make a measurement for this pixel,

indicating the degree to which this pixel is considered to be a corner. Different corner

detection approaches have different measurement criteria, but all measurements are

made for the pixels within a window centered on the input pixel in image. In the

project, the Harris/Plessey corner detection is introduced.

(1) Harris/Plessey Corner Detection

The Harris/Plessey corner detection was developed by Chris Harris and Mike

Stephens in 1988 [5]. This is a combined corner and edge detector which allows the

variation of the autocorrelation over all different orientation to be obtained [18]. The

method is stated below [5, 18]:

For each pixel (x, y) in the image, calculate the autocorrelation matrix M:

CA

M

BC

(3.8)

where: wxIA

2

, wyI

B

2

, wyI

xI

C

is the convolution operator and w is the Gaussian window

Construct the cornerness map by calculating the cornerness measure C(x, y) for

each pixel (x, y):

2det, MtracekMyxC (3.9)

221det CABM (3.10)

BAMtrace 21 (3.11)

constk

20

(2) Algorithm Design

In order to implement the Harris/Plessey corner detection to obtain the cornerness

map, we proposed the following steps to realize the algorithm.

Differentiation [3, 18]

The Prewitt operator is commonly used to approximate the first-order derivation of an

image. The values ofxI and

yI are approximated by the simple templates below.

A1

A4

A7

A3

A6

A9

A2

A5

A8

Figure 3.8 Template labeling

1,0,15465

AAAA IIIx

I TAAAA IIIy

I1,0,1582

5

Figure 3.9 Horizontal Gradient and Vertical Gradient

Therefore, for each input pixel we obtain its horizontal and vertical gradientsxI

andyI

, then2

xI

,2

yI

and

yI

xI

can be calculated respectively.

Gaussian Window [3, 18]

The use of a Gaussian window in the Harris/Plessey corner detection can reduce the

21

noise response. A 5×5 Gaussian window with =1.4 is given in Figure 3.10.

Figure 3.10 Gaussian Window

According to the algorithm, the Gaussian window is convolved with2

xI

,2

yI

and

yI

xI respectively to result in the autocorrelation matrix M.

Construction of Cornerness Map

By knowing the autocorrelation matrix M, the measurement of cornerness for each

pixel can be made by calculating the Trace(M) and Det(M). For each input image, the

output of the Harris/Plessey corner detection is a cornerness map.

3.2.3. Scale Operation

As we can see that, the cornerness map consists of corners with gigantic intensity

values, which are caused by cornerness measurement. In order to process the

cornerness map efficiently, the pixel values of the image are mapped into the range of

0 to 255 by a scale operation. We replace intensity values with ones computed

according to Equation 3.12.

255minmax

min,,

OO

OON yx

yx (3.12)

The brightness level of theold image O starts at minO and extends up to maxO , then

the image are scaled so that the pixel values in the new image N are between the

22

range 0 and 255 [3]. Since the scale operation is a linear brightness transformation,

the overall shape of the image histogram is not changed [3].

3.2.4. Thresholding

Thresholding of the cornerness map is one of the most important steps in a corner

detection. Corners are defined as local maxima in the cornerness map [18]. As each

pixel of the input image is measured by a corner operator, not all pixels correspond to

corners in cornerness map. Therefore, the local maxima that have relatively small

cornerness measures are considered to be false corners. To avoid reporting these

points as corners, we can threshold the cornerness map by setting all values below a

certain threshold level to zero [18]. Choosing this threshold level is difficult as it

depends on the requirements of application. The threshold level should be high

enough to remove the false corners, but low enough to retain most of the true corners.

In practice there is a trade-off in selecting the threshold level based on the system

requirements. In the project, a relatively low threshold level is chosen to retain

enough corners and remove obvious noise, and then an advanced technique is used to

extract the local maxima. Thresholding the cornerness map can make the system more

efficient in terms of the computational load.

3.2.5. Non-maximal Suppression

Non-maximal suppression is applied to the thresholded cornerness map to locate the

local maxima. For each pixel in the thresholded cornerness map, a square window is

centered on it. If the cornerness measure of this pixel is the largest within this window,

the pixel is retained with its cornerness measure. Otherwise, the cornerness measure

of this pixel is set to zero. A 3×3 square window is given in Figure 3.11.

After the implementation of the non-maximal suppression, the corners are simply the

non-zero points remaining in the thresholded cornerness map [18].

23

A1 A2

A5A4

A3

A6

A7 A8 A9

Figure 3.11 Non-maximal Suppression

3.2.6. Feature Points Extraction

It is proposed that the number of feature points extracted from each of the two stereo

images should be equal. In the corner image, there are a large number of corners

which have different intensity values. According to the Harris/Plessey corner

detection, the larger intensity value the pixel has, the stronger the corner is. Therefore,

the feature points can be chosen according to the intensity values. In the project,

twenty maximal-intensity corners are extracted from each corner map as the feature

points.

3.2.7. Summary

The proposed method to extract feature points for stereo matching is based on corner

detection. In the project, Harris/Plessey’s method is used to detect corners. Cornerness

map is obtained from input image by the Harris/Plessey operator, and then a

thresholding operation is applied to remove the false corners and retain most of the

true corners. Finally, the non-maximal suppression is used to find the local maxima in

the thresholded cornerness map, and feature points which are defined as corners can

then be extracted The use of Harris/Plessey corner detection has the advantage that we

can choose the matching points according to the cornerness measure. The larger the

cornerness measure is, the stronger the corner would be. In this point of view, we can

guarantee that the matching points extracted are strong corners in an image, and this

increase the possibility of finding more real matches. In the project, only twenty

24

points are extracted from each of the two stereo images, more feature points can be

extracted, but it is noticed there is a trade-off between the accuracy and computational

cost.

3.3. Stereo Matching

Matching is the most important stage in stereo image processing. Given two stereo

images, correspondence needs to be achieved among the homologous features. As a

result, features that are projections of the same physical identity in the real world are

found. In the project, the primitives used for matching are the feature points extracted

by the above algorithms.

3.3.1. Block Matching Algorithm

Block matching technique has been so far widely used for finding the corresponding

points in stereo vision. It simply groups pixels together into blocks, and then match

theses blocks [6]. In stereo vision, each block from the left image is matched into a

block in the right image, and the sum of differences between the intensity values of

the two blocks is calculated by a certain criterion. The pairs of blocks which give the

relatively small computed metrics are considered to be the real matches.

In the ideal case, two matching blocks have exactly the same corresponding pixels [7].

Unfortunately, this is rare because many factors can result in the difference. For

example, the target object can be out of shape due to the angle of view, and the overall

illumination on the images may change, furthermore, there is always noise in the real

world. Despite these problems, block matching is by far a simple and popular stereo

matching algorithm.

3.3.2. Experiments

In the project, as we have extracted feature points from the stereo images, a block is

25

centered on each of these interest points, by calculating the similarity measurement,

we can know how many points between the two images are matched. Compared with

the full search method in which block matching is applied for the entire image, the use

of the feature points as the matching primitives can dramatically reduce the

computational complexity.

(1) Block Size

Selecting a proper block size for stereo matching is not an easy task. Generally, large

block size is insensitive to image distortions, while small block size is computational

efficient. Therefore, a trade-off must be made in choosing a right block size, and this

also depends on the requirements of application. In practice, if a feature point is close

to the image borders, the use of a large block size can probably makes part of the

block centered on this point be out of the image. Essentially, the leading factor in the

project for choosing the block size is the distance between the object and the stereo

cameras. It is proposed that an adaptive block size is used to proportion the changes of

distance. If the object appears big in an image due to the short distance capture, a

large block size should be chosen to avoid false matching, while if the object is small

in the image, a small block size has to be applied.

(2) Search Region

The search region plays an important part in finding the match. In the project, the

feature points are used as the matching primitives, and the true matching positions are

searched only among these extracted points. If the amount of the extracted feature

points is large enough, more real matches can be found, but the computational load

grows with the increase of the feature points. If the amount of the feature points is

small, the computational cost is reduced, but the false match might be found.

Therefore, a trade-off must be made based on the system requirements.

26

(3) Matching Criteria

There are many commonly used matching criteria based on pixel differencing, such as

mean absolute difference (MAD), mean squared distance (MSD), and normalized

cross-correlation (NCC) [7]. In the project, the mean absolute difference (MAD) is

used to measure the similarity [7].

m

i

n

j

jiBjiAmn

MAD1 1

,,1

(3.13)

where the block size is m×n.

(4) Matching Process

In the project, we extract twenty feature points from each of the two stereo images,

and this is shown In Figure 3.12.

Figure 3.12 Feature Points in a Pairs of Stereo Images

For each point in the left image, we search all the points in the right image to find the

best matching point with minimal MAD. Therefore, all feature points in the left image

are mapped to the feature points in the right image according to the MAD. The outline

of the matching points is shown in Figure 3.13.

0A , ?B ?MAD

27

1A , ?B ?MAD

2A , ?B ?MAD

......

19A , ?B ?MAD

Figure 3.13 Outline of Matching Points

To decide which pairs of points above are real matches is not a trivial task. One of the

simplest solutions is to choose the pair of points with the minimal MAD in the above

list as the best match. Although this method works well to extract the best match in

our experiments, it can cause error due to the amount of feature points and the size of

block. Moreover, this method can only find one real match between the two stereo

images. To make the matching approach robust and more accurate, we can set a

threshold to retain the pairs of points with a MAD below this threshold. Alternatively

we can reorder the MADs in the list from minimum to maximum, and then choose the

pairs of points with relatively small MADs (e.g. the top five pairs of points in the

reordered list). These selected pairs of points are called candidate matches. In order to

decide the real matches, the horizontal offset (x offset) is calculated for each pair of

corresponding points, and the real matches should have approximately the same x

offsets. Apparently, more than one real match can be extracted by this approach.

Figure 3.14 shows the matching method used in the project.

3.3.3. Summary

In the project, block matching is applied to achieve the correspondence between

stereo image pairs, and mean absolute difference (MAD) is used to measure the

similarity. As feature points of certain number have been extracted from each of the

two stereo images, matches are searched among these interest points by centering a

block on each available point. Because we have relatively small amount of feature

points, block matching algorithm runs very fast even a large block size is applied.

28

Each point in the left image is mapped to its matching point in the right image by

calculating MAD. We outline all candidate matches and determine the real matches

according to the value of MAD and the horizontal offset of two matching points.

Load Reference Images

Create a Block fromLeft Image

Use all Points?

Create a Block fromRight Image

Use all Points?

MAD<Low?

MAD=LowUpdate Coordinate

Save the MatchingPoints and x-offset

Outline all the Matches

Determine the RealMatches

N

Y

N

Y

N

Y

Figure 3.14 Flow Diagram of Matching Algorithm

3.4. Distance Determination

The distance between the stereo pair of cameras and the target object can be

determined by the imaging geometry shown in Figure 3.15.

The imaging geometry of a conventional stereo imaging system involves a pair of

cameras with their optical axes mutually parallel and separated by a horizontal

distance denoted as the stereo baseline [4]. The cameras have their optical axes

perpendicular to the stereo baseline, and their image scan lines parallel to the baseline

29

(horizontal) [4]. Since the displacement between the optical centers of the two

cameras is purely horizontal, the position of corresponding points in the two images

can differ only in the horizontal component [4].

LXLO

LY

RXRO

RY

RZLZ

LI RI

LP RP

f

zyxP ,,

b

baseline

Figure 3.15 Stereo Geometry

In the Figure 3.15, the origin of the world coordinate system is LO , the effective

focal length of each camera is f , and the stereo baseline is b . The LLLL zyxP ,,

and RRRR zyxP ,, are the projections of the point zyxP ,, .The disparity value d

is defined as the x offset of each matched pair of points LLLL zyxP ,, and

RRRR zyxP ,, , RL xxd . The world coordinates of the point zyxP ,, can be

obtained by considering similar triangles [4].

,d

bxx L ,d

byy L andd

bfz [4] (3.14)

The required distance between the stereo cameras and the target object is then the

value of z in Equation 3.14, and this can be presented in Equation 3.15

Distance =Disparity

hFocalLengtBaseline(3.15)

30

Chapter 4. Results

To evaluate the performance of the proposed algorithms in the project, we implement

the edge detection based algorithm and corner detection based algorithm on our

database respectively. The initial stereo images are shown in Appendix 1. It contains

images of a car taken in different conditions. The additional stereo images in

Appendix 2 were taken from different kind of vehicles. The experimental results and

analysis are represented in this chapter.

4.1. Edge Detection Based Algorithm

The proposed algorithm is applied to our database of car stereo images, and the

experimental process can be divided into two parts. In the first part, we extract twenty

feature points from each of the two stereo images. In the second part, we achieve the

best match between these two images. Firstly, the initial database of Appendix 1 is

used in our experiments, and then we extend the algorithm to additional database we

have obtained in Appendix 2.

4.1.1. Extraction of Feature Points

As we can see from the results, the vertical edge image extracted by the system

consists of very clear vertical edges. In the area of car number plate, we obtain dense

vertical edges. Furthermore, most of the noise has been removed, and the edge image

contains only the information we are interested in. In the image of vertical lines, as

expected, the ten vertical lines extracted by means of Hough Transform tend to appear

in the region of number plate. It is proposed that the top and bottom points of each

vertical line are chosen as the feature points, and then there should be twenty points in

the ideal case. But in practice, if the vertical line is not long enough, which means it is

false feature component, and then we do not extract points from it.

31

4.1.2. Block Matching

In the process of block matching, we choose the block size of 30*30. It is noticed that

there is a border problem as the block is centered on the point. The width of the

border is 15, so the feature points which are inside the image borders should not be

used for matching. In our experiments, for each matching point in the left image, we

search all the matching points in the right image, and the mean absolute difference

(MAD) is used to measure the similarity between two matching blocks. The best

match is defined as the two corresponding points with the minimal MAD in the

process of block matching.

4.1.3. Matching Results of Initial Database

The initial database of Appendix 1 contains car stereo images in two different

situations, which we call “front car”and “rear car”in the project. The images from

Figure 4.6 to 4.9 are the results of processing the images of Appendix 1 by the system.

Figure 4.6 and 4.8 display the results of the image pairs named ‘front 58’and ‘rear 43’,

and the layout is shown as in Figure 4.1. For the rest data, only the image pairs which

show the best match are presented.

Original Image(Left)

Original Image(Right)

Vertical EdgeImage (Left)

Vertical EdgeImage (Right)

Vertical Lines(Left)

Feature Points(Left)

Vertical Lines(Right)

Feature Points(Right)

Best Match (Left) Best Match(Right)

Figure 4.1 Layout of Results for Initial Database

32

(1) Mean Absolute Difference (MAD)

Image MAD Best Match

Front 265

Front 165

Front 94

Front 58

Rear 205

Rear 152

Rear 113

Rear 67

Rear 43

28

24

9

9

7

10

9

21

5

(107 , 60) (91 , 67)

(97 , 97) (75 , 106)

(273 , 156) (236 , 167)

(214 , 196) (152 , 203)

(219 , 88) (205 , 98)

(185 , 66) (165 , 76)

(267 , 53) (240 , 65)

(289 , 31) (241 , 43)

(244 , 50) (167 , 61)

Figure 4.2 Mean Absolute Difference (MAD)

(2) Disparity (or X-offset)

Car Position Actual Distance (cm) Disparity (pixels)

Front

Rear

Front

Rear

Rear

Front

Rear

Front

Rear

265

205

165

152

113

94

67

58

43

16

14

22

20

27

37

48

62

77

Figure 4.3 Disparity (or X-offset)

33

(3) Distance Determination Results

The distance can be determined by using Equation 3.15, where the baseline is 93mm.

As the camera focal length is unknown, it has to be estimated based on the available

data. Since the stereo images given in the project are taken at measured distances, the

focal length can be calculated by Equation 4.1. In our experiments, we calculate the

individual focal length for each pair of images, and the camera focal length was

approximated by averaging all these calculated focal lengths.

BaselineDisparityceDis

hFocalLengt

tan

(4.1)

In order to reduce the computational complexity, we simply calculate the Factor

defined in Equation 4.2 instead of the focal length.

DisparityceActualDishFocalLengtBaselineFactor tan (4.2)

Calculated Distance=Disparity

torAverageFac(4.3)

Actual Distance (cm) Disparity (pixels)

265

205

165

152

113

94

67

58

43

16

14

22

20

27

37

48

62

77

Factor

42.40

28.70

36.30

30.40

30.51

34.78

32.16

35.96

33.11

33.80Average Factor

Figure 4.4 Average Factor

34

According to Equation 4.2, the ‘Average Factor’is calculated to be 33.80. The

estimation of the distance can be made by using Equation 4.3, and the results are

shown in Figure 4.5.

Figure 4.5 Calculated Distance and Percentage Error

In the project, only a small number of stereo images is available, the calculated

‘Average Factor’is inaccurate, in that the camera focal length can not be precisely

determined. The errors of the system could be reduced by increasing the amount of

data for calculating the ‘Average Factor’.

35

Figure 4.6 Front 58

36

(a) Front 94

(b) Front 165

(c) Front 265

Figure 4.7 Front Car

37

Figure 4.8 Rear 43

38

(a) Rear 67

(b) Rear 113

(c) Rear 152

39

(d) Rear 205

Figure 4.9 Rear Car

4.1.4. Matching Results of Additional Database

The additional database in Appendix 2 consists of stereo images which are taken from

different kind of vehicles. The proposed algorithm is applied to these images, and

only the images of best match are displayed.

(1) Straight Car

This set of car stereo images is similar to the images of initial database. The results of

finding the best match are shown in Figure 4.10. In Figure 4.10 (c), the correct best

match is found by increasing the block size from 30×30 to 60×60, and this is due to

the short distance capture. In Practice, we can set an adaptive block size which is

proportional to the changes of the distance between the stereo cameras and the target

object.

40

(a) Straight Car 196

(b) Straight Car 81

(c) Straight Car 38 (Block Size = 60*60)

Figure 4.10 Straight Car

41

(2) Angled Car

This set of images was taken of a car at an angle of view. The matching results are

shown in Figure 4.11. The system can correctly find the best match between two

stereo images.

(a) Angled Car 137

(b) Angled Car 110

Figure 4.11 Angled Car

42

(3) Taxi

The set of images is taken of a taxi in different angles of view and in different

distances. Using the proposed algorithm can correctly achieve the best match between

stereo image pairs. However in Figure 4.12 (a), we found the best corresponding

points at the windows in the images. Although this is the real match with minimal

MAD, it is supposed to be found at the region of the taxi. A simple solution is to

outline all possible real matches, and choose the one with minimal MAD in the region

of the taxi.

(a) Taxi 340

(b) Taxi 222

43

(c) Taxi 122

(d) Taxi 82

Figure 4.12 Taxi

(4) Foreign Van

The stereo images of foreign van are tested by the system to find the best match. The

results are shown in Figure 4.13, the best match can be found correctly. In Figure 4.13

(a) and (b), the detected matching point in the left image and the point in the right

image have a relatively big vertical displacement. As we only consider the offset in

the horizontal direction, this problem does not affect the system results.

44

(a) Foreign Van 210

(b) Foreign Van 148

(c) Foreign Van 94

Figure 4.13 Foreign Van

45

(5) Landrover

This set of stereo images is taken from the rear of a landrover. The system works well

to extract the best match between two stereo images, and the results are shown in

Figure 4.14.

(a) Landrover 380

(a) Landrover 260

46

Landrover 176

Landrover 141

Figure 4.14 Landrover

(6) White Car

The set of stereo images taken of a white car is the most demanding situation. The

two vertical edges of number plate are significant features for matching process, but

we may loss this important information as the number plate has similar intensity

values to the surrounding paintwork. However the proposed algorithm can

successfully find the best match in the rest region of the car. The results are shown in

Figure 4.15.

47

White Car 186

White Car 130

White Car 68

Figure 4.15 White Car

48



Straight Car 196

Angled Car 137

Taxi 340

9

16

26

7

9

16

17

16

(244 , 106) (226 , 110)

(87 , 105) (42 , 105)

(183 , 108) (82 , 109)

(63 , 110) (41 , 110)

(189 , 101) (156 , 103)

(143 , 42) (139 , 46)

(162 , 119) (139 , 122)

(113 , 72) (82 , 72)

(77 , 174) (26 , 167)

Straight Car 81

Straight Car 38

Angled Car 110

Taxi 222

Taxi 122

Taxi 82

Foreign Van 210

Foreign Van 148

Foreign Van 94

Landrover 380

Landrover 260

Landrover 176

Landrover 141

White Car 186

12

10

22

8

(286 , 60) (261 , 46)

(326 , 30) (298 , 44)

(117 , 37) (76 , 38)

(287 , 174) (268 , 183)

(299 , 35) (278 , 48)

(193 , 28) (167 , 33)

(57 , 230) (25 , 224)

8

10

16

White Car 130

White Car 68

12

11

8

(313 , 98) (285 , 112)

(146 , 107) (111 , 124)

(187 , 26) (113 , 30)

10


49


Straight Car

Angled Car

Taxi

380

340

260

222

210

196

186

148

19

xx

20

23

25

18

28

29

28

Straight Car

Straight Car

Angled Car

Taxi

Taxi

Taxi

Foreign Van

Foreign Van

Foreign Van

Landrover

Landrover

Landrover

Landrover

White Car

141

137

130

110

32

22

35

31

33

41

51

122

94

82

White Car

White Car

81

68

38

45

74

101


176


50


380

340

260

222

210

196

186

148

19

xx

20

23

25

18

28

29

28

141

137

130

110

32

22

35

31

33

41

51

122

94

82

81

68

38

45

74

101


176

Factor

72.20

xx

52.00

51.06

52.50

35.28

52.08

51.04

41.44

45.12

30.14

45.50

37.82

36.30

38.54

41.82

36.45

50.32

38.38

Average Factor 44.89


51

380

340

260

222

210

196

186

148

19

xx

20

23

25

18

28

29

28

141

137

130

110

32

22

35

31

33

41

51

122

94

82

81

68

38

45

74

101

Actual Distance(cm)

Disparity(pixels)

176

Calculated Distance(cm)

236.26

xx

224.45

195.17

179.56

249.39

160.32

154.79

160.32

140.28

204.05

128.26

144.81

136.03

109.49

88.02

99.76

60.66

44.45

Error(%)

-37.83

xx

-13.67

-12.09

-14.50

27.24

-13.81

-12.05

8.32

-0.51

48.94

-1.34

18.70

16.48

23.66

7.34

23.16

-10.79

16.97


52

4.2. Corner Detection Based Algorithm

We implement the proposed corner detection based algorithm on the database of car

stereo images. Twenty feature points are extracted from each image by Harris/Plessey

corner detection, and then we search real matches among these points. Firstly, the

initial database of Appendix 1 is tested by the system, and then our experiments are

extended to additional database in Appendix 2.

4.2.1. Extraction of Feature Points

We can see from the results that the cornerness map produced by Harris/Plessey

corner detection consists of very clear cornerness components, and there is an enough

difference in the contrast of the cornerness component and its surroundings. As

expected, most of the cornerness components are detected in the area of number plate.

In fact, the cornerness map contains a lot of noise, which are invisible due to the large

range of intensity values. Thresholding of the cornerness map can remove most of the

noise and retain the required cornerness components. The proposed method of

non-maximal suppression is used to obtain the local maxima, and twenty

max-intensity points are extracted by the system.

4.2.2. Block Matching

The process of block matching is slightly different from that is used in the edge

detection based algorithm. As we can obtain points of certain number from each of the

two stereo images, we intend to extract all real matches. Each point in the left image

is mapped to its best matching point in the right image, and we choose five pairs of

points with low MADs as the candidate matches. The best match with the minimal

MAD is set as the benchmark, and then we compare the x offset of each candidate

match with the x offset of the best match. If the difference is within the acceptable

range, then this candidate match is considered to be the real match.

53

4.2.3. Matching Results of Initial Database

The images from Figure 4.25 to 4.28 are the results of processing the initial database

of Appendix 1 by the system. Figure 4.25 and 4.27 display the results of the image

pairs named ‘front 58’and ‘rear 67’, and the layout is shown as in Figure 4.20. For the

rest data, only the image pairs which show the best match are presented. In out

experiments, a relatively small threshold level =10 is selected to guarantee enough

local maxima can be extracted, and then twenty matching points are picked out

according to MAD values. The block size is set to 50×50 in the project.

Original Image(Left)

Original Image(Right)

Cornerness Map(Left)

Cornerness Map(Right)

Corners (Left)

Feature Points(Left)

Corners (Right)

Feature Points(Right)

Best Match (Left) Best Match(Right)

Real Matches(Left)

Real Matches(Right)

Figure 4.20 Layout of the Results for Initial Database

54



Front 265

Front 165

Front 94

Front 58

Rear 205

Rear 152

Rear 113

Rear 67

Rear 43

34

26

15

12

13

14

17

14

15

(194 , 64) (178 , 74)

(107 , 97) (85 , 105)

(260 , 125) (223 , 136)

(189 , 206) ( 127, 215)

(114 , 43) (102 , 51)

(202 , 85) (182 , 95)

(239 , 83) (212 , 93)

(190 , 37) (141 , 47)

(257 , 81) (180 , 91)




Front

Rear

Front

Rear

Rear

Front

Rear

Front

Rear

265

205

165

152

113

94

67

58

43

16

12

22

20

27

37

49

62

77


55



265

205

165

152

113

94

67

58

43

16

12

22

20

27

37

49

62

77

Factor

42.40

24.60

36.30

30.40

30.51

34.78

32.83

35.96

33.11

33.43Average Factor



56

Figure 4.25 Front 58

57

(a) Front 94

(b) Front 165

(c) Front 265

Figure 4.26 Front Car

58

Figure 4.27 Rear 67

59

(a) Rear 43

(b) Rear 113

(c) Rear 152

60

(d) Rear 205

Figure 4.28 Rear Car

4.2.4. Matching Results of Additional Database

The proposed algorithm is applied to the additional database in Appendix 2, and the

results are shown in Figure 4.29 to Figure 4.34. We only display the images best

match in each of these figures.

(1) Straight Car

(a) Straight Car 196

61

(b) Straight Car 81

(c) Straight Car 38

Figure 4.29 Straight Car

(2) Angled Car

(a) Angled Car 137

62

(b) Angled Car 110

Figure 4.30 Angled Car

(3) Taxi

(a) Taxi 340

(b) Taxi 222

63

(c) Taxi 122

(d) Taxi 82

Figure 4.31 Taxi

(4) Foreign Van

(a) Foreign Van 210

64

(b) Foreign Van 148

(c) Foreign Van 94

Figure 4.32 Foreign Van

(5) Landrover

(a) Landrover 380

65

(b) Landrover 260

(c) Landrover 176

(d) Landrover 141

Figure 4.33 Landrover

66

(6) White Car

(a) White Car 186

(b) White Car 130

(c) White Car 68

Figure 4.34 White Car

67



Straight Car 196

Angled Car 137

Taxi 340

12

16

13

18

26

20

22

21

(146 , 125) (128 , 127)

(173 , 151) (127 , 152)

(242 , 216) (139 , 217)

(130 , 107) (105 , 108)

(250 , 38) (215 , 42)

(172 , 41) (168 , 47)

(133 , 116) (110 , 118)

(74 , 150) (38 , 147)

(75 , 145) (27 , 140)

Straight Car 81

Straight Car 38

Angled Car 110

Taxi 222

Taxi 122

Taxi 82

Foreign Van 210

Foreign Van 148

Foreign Van 94

Landrover 380

Landrover 260

Landrover 176

Landrover 141

White Car 186

13

35

21

19

(317 , 64) (310 , 79)

(201 , 69) (170 , 75)

(216 , 29) (174 , 36)

(326 , 136) (313 , 151)

(248 , 35) (228 , 46)

(84 , 173) (56 , 170)

(91, 87) (62 , 86)

15

17

24

White Car 130

White Car 68

12

13

7

(287 , 100) (259 , 111)

(190 , 66) (158 , 71)

(189 , 19) (135 , 23)

17


68


Straight Car

Angled Car

Taxi

380

340

260

222

210

196

186

148

xx

xx

20

23

xx

18

28

28

31

Straight Car

Straight Car

Angled Car

Taxi

Taxi

Taxi

Foreign Van

Foreign Van

Foreign Van

Landrover

Landrover

Landrover

Landrover

White Car

141

137

130

110

29

25

32

36

35

42

48

122

94

82

White Car

White Car

81

68

38

46

54

103


176


69


380

340

260

222

210

196

186

148

xx

xx

20

23

xx

18

28

28

31

141

137

130

110

29

25

32

36

35

42

48

122

94

82

81

68

38

46

54

103


176

Factor

xx

xx

52.00

51.06

xx

35.28

52.08

49.28

45.88

40.89

34.25

41.60

43.92

38.50

39.48

39.36

37.26

31.32

39.14

Average Factor 39.49


70

380

340

260

222

210

196

186

148

xx

xx

20

23

xx

18

28

28

31

141

137

130

110

29

25

32

36

35

42

48

122

94

82

81

68

38

46

54

103

Actual Distance(cm)

Disparity(pixels)

176

Calculated Distance(cm)

xx

xx

197.45

171.70

xx

219.39

141.04

141.04

127.39

136.17

157.96

123.41

109.69

112.83

94.02

82.27

85.85

73.13

38.34

Error(%)

xx

xx

-24.06

-22.66

xx

11.93

-24.17

-19.86

-13.93

-3.43

15.30

-5.07

-10.09

0.02

2.57

0.33

5.99

7.54

0.89


71

Chapter 5. Discussion

The algorithms proposed in this project for depth estimation satisfactorily meet the

system requirements. In order that these algorithms can be robust to handle more

demanding situations and adapt to different applications, we discuss in this chapter

some problems which need to be solved and any possible modifications that can be

made in this project.

5.1. Extraction of Edge-based Feature Points

The edge detection based algorithm combines vertical edge detection with Hough

transform (HT) to extract the feature points. As edge detection is bound to respond to

noise, the template of edge detector has to be chosen carefully. In practice, a trade-off

must be made based on the application requirements. The edge image contains thick

vertical edges that need to be dealt with for further use. On one hand, the

morphological filter is applied to make the vertical edges thinner, and this is to avoid

Hough transform detecting redundant vertical lines on the same edge position. On the

other hand, as the edges have been made to be single pixel lines, there may be

discontinuous points. The edge points should be connected, because we have to

extract the top and bottom points of each edge. The Hough transform is applied to

assist in choosing the strong vertical edge features. In this project, we are interested in

the vertical edges particularly in the area of car number plate, and we assume these

edges are absolute vertical. Consider that the real vertical edges may slightly deviate

from the standard position due to the angle of view, and this problem can be solved by

setting a range of the deviation angle for the vertical lines int Hough transform..

5.2. Extraction of Corner-based Feature Points

The corner detection based algorithm extracts the feature points by corner detection.

72

The major advantage of the corner detection is that it directly detects the points of

interest. Compared with the proposed edge detection based algorithm, this approach is

hence insensitive to distortions as a result of changes in perspective. The

Harris/Plessey method is modified and used in this project, and this corner detection

results in a cornerness map which contains pixels of large intensity values. A scale

operation must be applied to the cornerness map to make its intensity range from 0 to

255. It is obvious that not all points in the cornerness map are true corners, and

thresholding is typically used to choose the true corners with large brightness values.

In this project, we proposed a method which combines thresholding and selection of

max-intensity points. In this point of view, thresholding is only used to remove

apparent false corners and reduce the computation cost. Therefore the threshold level

has to be relatively small to retain enough candidate corners. The procedure of

selecting the max-intensity points is the core operation to extract feature points. Due

to the scale operation, many strong corners may have same intensity values (such as

255) in the new cornerness map, although they are different in the original cornerness

map. In order that the selection operation can choose the real corners according to the

intensity values from maximum to minimum, we must extract enough points from the

new cornerness map to guarantee that all the strong corners are selected.

5.3. Matching Algorithm

Block matching is considered as a reasonably technique due to its computational

simplicity, and it is successfully operational for the system. One of the major

difficulties in the process of block matching is the selection of block size. The block

size has big influence on the stereo matching results, for example large block size can

increase the computational cost, and small block size is sensitive to noise.

Furthermore, the border problem must be considered, when the feature points are

located in the border area of an image, a small block size should be chosen to

correctly calculate the MAD. The border width is defined as the half length of one

side of block. If the feature points are in the borders, blocks are not set on these points.

73

The use of an adaptive block size can be used in practice.

5.4. Additional Problems

As we can see from images given in this project, some pairs of images are misaligned.

The misalignment of corresponding image has large influence on the block matching

results, especially when the block size is relatively big. In our experiments, we can

rotate the images manually to make each pairs of images aligned.

Another major problem in depth estimation is that the camera focal length is unknown.

In this project, we approximate the focal length based on the available data. The

images are taken at measured distances, therefore the actual distance is known. As the

database available is not big enough, the estimation of the camera focal length is not

accurate.

74

Chapter 6. Conclusions

In this thesis, we have proposed two algorithms for stereo depth estimation, named

edge detection based algorithm and corner detection based algorithm, both are

satisfactorily operational for the system.

In the edge detection based algorithm, the Sobel edge operator is used to detect the

vertical edges in an image. For each of the available stereo images, we have strong

vertical edges on the vehicle, especially in the region of number plate. Compared with

detecting horizontal edges, detecting vertical edges is more reasonable in terms of its

prior noise-suppression characteristic. Since there are many vertical edges in the edge

image, we have to choose edges of certain number for feature extraction. The criterion

of selection these required edges is based on the strength of the edges. As Hough

transform can extract lines based on an evidence gathering approach where the

evidence is the votes cast in an accumulator space [3], the vertical edges are thinned

to be single pixel lines and then a Hough transform is used to extract strong edge

features. Finally, the top and bottom points of each extracted edge are considered as

the required feature points. This approach is computational efficient and can correctly

extract points of interest from corresponding features between two stereo images.

In the corner detection based algorithm, feature points are directly extracted by corner

detection. The Harris/Plessey approach, a combined corner and edge detector, is used

in this project. This corner detection method is simple and can successfully detect

corners which have sharp contrast to the background. The strong corners are

particularly located in the vehicle number plate, because this region contains many

junctions of edges. Corner detection detects many corners in an image, but we are

only interested in the corners with large enough brightness values, and the rest corners

are not considered to be true corners. Thresholding is applied as the preprocessing to

remove the obvious false corners which have very small intensity values, and the

remaining non-zero points are considered to be true corners. We sort these corners

75

according to their intensity values from maximum to minimum, and the first twenty

(or more) corners are chosen as the feature points. This algorithm is simple and robust

to handle more demanding situations.

The block matching algorithm is implemented in this project to achieve

correspondence between two stereo images. In the process of block matching, a block

of fixed size is centered on each extracted feature point in an image, and then the

intensity difference is calculated between any block from the left image and any block

from the right image. The best match is determined based on the similarity

measurement (such as MAD). The two matching points that have the minimal MAD

are not always result in the best match. In this point of view, we can outline enough

pairs of matching points which have relatively small MADs, and then determine the

real matches based on the horizontal disparity of the point positions.

According to our experiments, both of the two algorithms are successfully operational

for the system to find correspondence and estimate depth information. For even more

demanding situations such as misalignment of corresponding images, the system can

overcome difficulties and give the results correctly.

Future work may involve in the modifications of the proposed algorithms to improve

the reliability and robustness of the depth estimation approach. More stereo images

need to be acquired for further experiments. To successfully implement these

algorithms in practice, a real-time operation system are required to be developed.

76

Acknowledgements

I would like to thank my supervisor Dr John Hannah for his choice of subject,

photography and assistance throughout the last three months. My thanks also go to

Paul Kuo and Dr Peter Hillman for their kind assistance with my project.

77

References

[1] R. C. Gonzalez and R. E. Woods, Digital Image Processing, Second Edition,

Prentice Hall, 2002.

[2] D. H. Ballard and C. M. Brown, Computer Vision, Prentice Hall, 1982.

[3] M. S. Nixon and A. S. Aguado, Feature Extraction and Image Processing,

Newnes, 2002.

[4] U. R. Dhond and J. K. Aggarwal, “Structure from Stereo-A Review,” IEEE

Transactions on Systems, Man and Cybernetics, vol. 19, pp. 1489-1510, Dec.

1989.

[5] C. Harris and M. Stephens, “A Combined Corner and Edge Detection,” Proc.

Alvey Vision Conf., Univ. Manchester, pp. 147-151, 1988.

[6] N. W. Walton, “Generating Depth Maps from Stereo Image Pairs,”Ph.D. thesis,

University of Edinburgh, UK, 2002.

[7] A. G.yaourova, C. Kamath, and S. Cheung, “Block Matching for Object

Tracking,”Lawrence Livermore National Laboratory, 2003.

[8] M. Yu and Y. D. Kim, “An Approach to Korean License Plate Recognition Based

on Vertical Edge Matching”, IEEE Int. Conf. SMC, vol. 4, pp. 2975-2980,2000.

[9] F. Candocia and M. Adjouadi, “A Similarity Measure for Stereo Feature

Matching,”IEEE Transactions on Image Processing, vol. 6, No. 10, Oct. 1997.

[10]A. M. Peacock, “Vision Systems Code Documentation,”University of Edinburgh,

UK, 2000.

[11]T. D. Duan, D. A. Duc, and T. L. H. Du, “Combining Hough Transform and

Contour Algorithm for Detecting Vehicles,” Proceeding of 2004 International

Symposium on Intelligent Multimedia, Video and Speech Processing, Oct. 2004.

[12]Y. Chen, Y. Hung, and C. Fuh, “Fast Block Matching Algorithm Based on the

Winner-Update Strategy,”IEEE Transactions on Image Processing, vol. 10, pp.

1212-1222, 2001.

[13]C. R. Jung and R. Schramm, “Rectangle Detection based on a Windowed Hough

78

Transform,”sibgrapi, vol. 00, no., pp. 113-120, Computer 2004.

[14]Y. Yanamura, M. Goto, and D. Nishiyama, “Extraction and Tracking of the

License Plate Using Hough Transform and Voted Block Matching”, IEEE IV2003

Intelligent Vehicles Symposium Conference, 2003.

[15]H. Bai, J. Zhu, and C. Liu, “A Fast License Plate Extraction Method on Complex

Background”, IEEE 2003 International Conference on Intelligent Transportation

Systems, vol. 2, Oct. 2003.

[16]B. Fisher, HIPR2, University of Edinburgh, UK,

http://www.inf.ed.ac.uk/people/staff/Robert_Fisher.html

[17]B. Fisher, Introduction to Stereo Imaging –Theory, University of Edinburgh, UK,

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MARSHALL/nod

e11.html

[18]D. Parks and J. Gravel, Corner Detection, McGill University, Canada,

http://www.cim.mcgill.ca/~dparks/index.htm

http://www.inf.ed.ac.uk/people/staff/Robert_Fisher.html

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MARSHALL/node11.html

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MARSHALL/node11.html

http://www.cim.mcgill.ca/~dparks/index.htm

A.1

Appendix 1. Initial Project Images

A.1.1. Front

265

165

94

58

A.2

A.1.2. Rear

205

152

113

67

43

A.3

Appendix 2. Additional Project Images

A.2.1. Straight Car

196

81

38

A.2.2. Angled Car

137

A.4

110

A.2.3. Taxi

340

222

122

82

A.5

A.2.4. Foreign Van

210

148

94

A.2.5. Landrover

380

260

A.6

176

141

A.2.6. White Car

186

130

68

Distance Determination from Pairs of Images ... - TU...

Documents

Transcript of Distance Determination from Pairs of Images ... - TU...