Comparison of Approaches to Feature DetectionComparison of Approaches to Feature Detection R.W....

6
Comparison of Approaches to Feature Detection R.W. Series, C.J. Radford, M.J. Varga, P. Fretwell, A.C. Sleigh. RSRE, St Andrews Rd, Malvern, Worcs. WR14 3PS. UK. This paper contrasts three different object identifi- cation approaches applied to feature detection. The first approach uses the conventional Hough trans- form, while the other two, namely the Edge List Search (ELS) and Full Image Search (FIS) algo- rithms are dynamic programming based techniques. The Hough transform approach accumulates evi- dence from the whole image in support a partic- ular feature or shape, which in this implementa- tion is restricted to circles. In contrast the FIS and ELS algorithms are based on local search tech- niques and are able to search for more complex shapes. The principal difference between the FIS and ELS method is that the ELS is based on a strongly edge based philosophy, while the FIS al- gorithm retains the concept of an edge only as a line which defines the shape of the sought feature. Feature detection is a key component of most image processing schemes and yet, despite its importance, it is still the Achillies heel of many systems. Selection of an optimum feature detector for some target application requires a compromise to be sought between several incompatible factors. The detector should be sensitive and capable of de- tecting the feature in the presence of noise, and yet it should minimise the number of false alarms. Equally a tolerance to the distortions brought about by changes in illumination, view angle and cam- era artefacts is desirable to minimise the number of models which must be retained in the feature li- brary. While there is a considerable literature on the de- sign of optimal feature detectors of one form or another, a system may only be considered opti- mal within the constraints of the formulation of the problem. One serious problem in many areas of im- age processing is that the major sources of noise are far from random and coherent noise in the form of camera artefacts, geometrical distortions, obscura- tion and illumination. These many and varied ef- fects may only be tested by critical examination of the performance of the feature detectors on actual images. A further problem in determining the overall per- formance of a system concerns the manner in which errors propagate. It is often found that gross failure of a system may be attributed to poor performance at a low level and that the severity of an error as its consequences propagate upwards increases. In this paper we compare the performance of three feature detection schemes. The domain of interest is detection of wheels (e.g. car wheels) in urban and semi-urban settings. The three methods selected adopt very different styles of tackling the problem. EXPERIMENTAL Hough Transform The first approach is based on the use of a Hough transform and is described more fully in [l]. A line-edge description of the scene is first extracted using an operator based upon the Sobel edge con- volution. The output of the horizontal and vertical Sobel convolutions yields both magnitude and di- rection of edge gradient. This information enables a non- maximum gradient suppression operation to thin the edge image to a single pixel wide edge map, and also allows a simple heuristic which limits the maximum turn between neighbouring pixels to be performed to suppress noise. Thinning is then fol- lowed by a hysteresis threshold similar to that em- ployed by Canny [2] to produce a skeletal edge map. With many semi-urban scenes it is found that the edge map contains an excessive line detail in regions of high texture, such as produced by foliage, and other vegetation. A discrimination between man- made and natural edge features is made using a simple measure of the fractal dimension of line seg- ments. The algorithm determines the total change in line orientation within each eight-connected line segment. This is normalised to give a Jine rough- ness factor which is used as a measure of the fractal dimension of the line segment and is thresholded to remove all 'rough' lines. 145 AVC 1989 doi:10.5244/C.3.25

Transcript of Comparison of Approaches to Feature DetectionComparison of Approaches to Feature Detection R.W....

Page 1: Comparison of Approaches to Feature DetectionComparison of Approaches to Feature Detection R.W. Series, C.J. Radford, M.J. Varga, P. Fretwell, A.C. Sleigh. RSRE, St Andrews Rd, Malvern,

Comparison of Approaches to Feature Detection

R.W. Series, C.J. Radford, M.J. Varga,P. Fretwell, A.C. Sleigh.

RSRE, St Andrews Rd, Malvern,Worcs. WR14 3PS. UK.

This paper contrasts three different object identifi-cation approaches applied to feature detection. Thefirst approach uses the conventional Hough trans-form, while the other two, namely the Edge ListSearch (ELS) and Full Image Search (FIS) algo-rithms are dynamic programming based techniques.

The Hough transform approach accumulates evi-dence from the whole image in support a partic-ular feature or shape, which in this implementa-tion is restricted to circles. In contrast the FISand ELS algorithms are based on local search tech-niques and are able to search for more complexshapes. The principal difference between the FISand ELS method is that the ELS is based on astrongly edge based philosophy, while the FIS al-gorithm retains the concept of an edge only as aline which defines the shape of the sought feature.

Feature detection is a key component of most imageprocessing schemes and yet, despite its importance,it is still the Achillies heel of many systems.

Selection of an optimum feature detector for sometarget application requires a compromise to besought between several incompatible factors. Thedetector should be sensitive and capable of de-tecting the feature in the presence of noise, andyet it should minimise the number of false alarms.Equally a tolerance to the distortions brought aboutby changes in illumination, view angle and cam-era artefacts is desirable to minimise the numberof models which must be retained in the feature li-brary.

While there is a considerable literature on the de-sign of optimal feature detectors of one form oranother, a system may only be considered opti-mal within the constraints of the formulation of theproblem. One serious problem in many areas of im-age processing is that the major sources of noise arefar from random and coherent noise in the form ofcamera artefacts, geometrical distortions, obscura-tion and illumination. These many and varied ef-fects may only be tested by critical examination of

the performance of the feature detectors on actualimages.

A further problem in determining the overall per-formance of a system concerns the manner in whicherrors propagate. It is often found that gross failureof a system may be attributed to poor performanceat a low level and that the severity of an error asits consequences propagate upwards increases.

In this paper we compare the performance of threefeature detection schemes. The domain of interest isdetection of wheels (e.g. car wheels) in urban andsemi-urban settings. The three methods selectedadopt very different styles of tackling the problem.

EXPERIMENTAL

Hough Transform

The first approach is based on the use of a Houghtransform and is described more fully in [l].

A line-edge description of the scene is first extractedusing an operator based upon the Sobel edge con-volution. The output of the horizontal and verticalSobel convolutions yields both magnitude and di-rection of edge gradient. This information enablesa non- maximum gradient suppression operation tothin the edge image to a single pixel wide edge map,and also allows a simple heuristic which limits themaximum turn between neighbouring pixels to beperformed to suppress noise. Thinning is then fol-lowed by a hysteresis threshold similar to that em-ployed by Canny [2] to produce a skeletal edge map.

With many semi-urban scenes it is found that theedge map contains an excessive line detail in regionsof high texture, such as produced by foliage, andother vegetation. A discrimination between man-made and natural edge features is made using asimple measure of the fractal dimension of line seg-ments. The algorithm determines the total changein line orientation within each eight-connected linesegment. This is normalised to give a Jine rough-ness factor which is used as a measure of the fractaldimension of the line segment and is thresholded toremove all 'rough' lines.

145

AVC 1989 doi:10.5244/C.3.25

Page 2: Comparison of Approaches to Feature DetectionComparison of Approaches to Feature Detection R.W. Series, C.J. Radford, M.J. Varga, P. Fretwell, A.C. Sleigh. RSRE, St Andrews Rd, Malvern,

Figure 1: Performance of Hough transform methodon a typical image, (a) Input image, (b) Edge mapafter fractal discriminant, (c) Hough space (d) ex-tracted peaks labelled in order of intensity. Onlypeaks 6, 2 and 7 comply with the peak shape re-quirement.

Wheel like features are found from the retained lineedge image using a version of the circle Hough trans-form. The normal circle Hough transform maps to athree dimensional parameter space, but with knowl-edge of edgel direction a two dimensional transformformed by projecting normals to the edgels in theimage will yield peaks at the centres of all arcs andcircles in the image. An advantage of this techniqueis that all concentric arcs and circles will contributeto the same peak, allowing in this instance wheelarches and wheel arc segments to contribute to apeak at the nominal wheel centre. Partial evidencesfor the existence of a feature are therefore accumu-lated into global evidence for that feature.

Since many shapes such as rectangles may also cre-ate peaks in the transform space it was found nec-essary to make measurements of the peak profiles,accepting only 'sharp' peaks as attributable to cen-tres of circles and arcs.

Figure 1 demonstrates the application of themethod to a typical image. Fig l(a) shows the orig-inal image and (b) the edge extracted image afterapplication of the fractal discriminant, (c) showsthe Hough transform output and in (d) the peaksin Hough space have been labelled in order of in-tensity. Peaks 1,3,4 and 5 are rejected as beingtoo difFuse. Only peaks 2, 6, and 7 satisfy the peakshape criterion, and in a rather complex image onlyone false centre is located. The method is relativelyfast and is suited to parallel systems.

Edge List Search

Edge List Search (ELS) is an object identificationapproach to feature extraction that uses dynamicprogramming. It is more general than the Houghtransform approach because it can extract shapesthat are complex. It is used here to identify anobject, e.g. wheels embedded in the large numberof edges which generally result from edge extractionapplied to a complex natural scene.

Dynamic programming can be used as an optimaltechnique of non-linear matching and is readily ap-plicable to one-dimensional data. The principle ofits use in pattern processing is that it is used to ob-tain a match between observed data and referencetemplates. Recognition or identification is made onthe basis of finding the template that best matchesthe observed data on a global basis. This approachhas been used with great success in one-dimensionalapplications such as speech recognition and digi-tal communications, however, it is not easily ex-tensible to two-dimensional data. The differencebetween the dynamic programming approach usedhere, [3], and that normally used, is that here, theprocess is reference or template data driven, ratherthan the more normal situation where it is observeddata driven. Additional improvements to the effi-ciency and storage requirements of the algorithmhave been made through the use of segment link-age records, rather than a full back linkage matrix;and the "pre-generation" of allowed connectivitiesbetween observed segments, in order to reduce thesearch space in a way analogous to the use of agrammar. Both these techniques have been usedin speech recognition but have not been applied inimage processing.

In this system a reference model is created consist-ing of a sequence of segments; outlines of circlesof different sizes are used as the reference segmentscorresponding to large and small wheels. The obser-vation data is generated as follows. As in the Houghtransform approach the first step is to produce an"edgel" image, this is a raster organised edge mapcontaining edge direction and intensity information.The raster organised edge map is then decomposedinto a list format. This comprises two stages. Firstthe node points (points with more than two edgepoint neighbours) are deleted in order to break theedge map into isolated lines or closed loops. Seconda list format data file is then created based on an8-adjacent neighbour connectivity. Closed shapesare arbitrarily broken. The resulting segments aretherefore edges, each segment being made up of alist of connected pixel data, each element of whichhas associated with it information about its hor-izontal and vertical positions and its orientation.Each segment is also recorded in reverse order so asto allow matching to be performed in either direc-

146

Page 3: Comparison of Approaches to Feature DetectionComparison of Approaches to Feature Detection R.W. Series, C.J. Radford, M.J. Varga, P. Fretwell, A.C. Sleigh. RSRE, St Andrews Rd, Malvern,

tion. The segments are unique with no cross junc-tion so as to eliminate ambiguities.

Long unbroken segments are used for the referencetemplates, whereas the large number of segmentsextracted from the natural images are short anddiscontinuous. In general there will be a large num-ber of edges extracted from the whole image, only afew of which will be associated with the object to beidentified, e.g. a car wheel. The system is designedto find any set or sequence of observed segmentsthat could, when associated together, correspondto a complete reference segment. The technique isreference data driven in that the search is carriedout by looking for a sub-set of the observed seg-ments to match the whole of the reference ratherthan the other way round.

A one pass dynamic programming technique is used.It gives the optimal path in terms of cumulativecost, where the cost is as a function of the connec-tivity, shape and orientation. In order to speed upthe program a prunning cost threshold is set on theaccumulated cost so as to "kill" bad paths at theearliest possible instance. At the end of the forwardpass the traceback identifies the segment sequencewith the lowest cost. The algorithm can provide acomplete set of graded sub-optimal paths which canbe used as alternative choices, i.e. multiple objecthypothesis can be extracted. In this application itmeans that more than one circular object can beidentified if it exits.

Since the observed segments are not connected thedynamic programming match has to be built up onthe basis of the possible connectivity between theshort, disconnected, observed segments as well astheir shape and orientation. Here connectivity isbased on the Euclidean distance between observedsegments. A fairly low cost penalty is imposed onconnecting ends of segments together (for instance,a higher penalty is imposed on the missing pixelswithin a segment). However, a list of permittedconnectivities between segments is pre-generated,based on a thresholded Euclidean distance; thislist is used to constrain the dynamic programmingsearch in a way analogous to a syntax. Other pre-processing carried out includes the elimination ofall one or two pixel segments from the search.

Figure 2, 3 and 4 demonstrate some results ob-tained using the ELS algorithm. In 3 the secondand third match traverse the same segments in theimage, but in different directions. When the scalebetween reference and image feature is well matched(as in figure 4) the algorithm performs well andfinds the three wheels present in the starting edgemap as it's first three paths. However, as with allmethods based on a chain code representation inorientation space, the method can make matches tofeatures which, when compared in Euclidean space

O (2)

O ( I ) '

ocu

Figure 2: Reference shapes and test image for ELSalgorithm

(1) Cost: 1535

Ll

(1) Cost: 1679

Ll

(1) Cost: 1716

Ll

(1) Cost: 1721

Ll

Figure 3: Best 4 matches against large wheel ref-erence using ELS algorithm. (Note that low costdenotes better match.)

(2) Cost: 1662 (2) Cost: 1689

(2) Cost: 1694 (2/Cost: 1730

Figure 4: Best 4 matches against small wheel ref-erence using EIJS algorithm.

Page 4: Comparison of Approaches to Feature DetectionComparison of Approaches to Feature Detection R.W. Series, C.J. Radford, M.J. Varga, P. Fretwell, A.C. Sleigh. RSRE, St Andrews Rd, Malvern,

have a ven poor fit. As with the HT methodthis may be treated by an additional stage of post-processing, or possibly more elegantly by a betterchoice of cost function during the dynamic program-ming stage.

Full Image Search

The full image search (FIS) algorithm is an attemptto avoid some of the precommitment inherent in theuse of edge based approaches. The method is bestdescribed by comparison to the ELS algorithm.

In the ELS algorithm the possible paths which thesearch may take are constrained to fall only on thepixel points which have been designated as 'edge'by the the edge finding stage, and may only departfrom following a sequence of 'edge' points whichhave nearest neighbour adjacency at the discretionof the list forming algorithm, an algorithm whichhas no domain specific knowledge. In contrast theFIS algorithm lifts this restriction and the path maymove over any pixel locations in the image, limitedonly by the choice of productions which embodiesdomain specific knowledge derived from the refer-ence shape and choice of production costs. In ef-fect all pixels are considered to have some degree of'edginess', and the algorithm searches for the pathwhich maximises the total 'edginess' consistent withthe shape constraints imposed by the reference.

The quality of the match between reference modeland image path is composed of the sum of two setsof terms. The first relate to the distortions suf-fered by the path in the image and are refered to asthe production penalties. The second set of termsmeasure the local cost of the association between apoint in the reference and that in the image. Wecompute the local cost as the dot product of theintensity gradient in image and reference.

Figure 5 demonstrates the scheme of productionschosen in more detail. As an example the tessalla-tion path taken by a four stage reference shape isshown. At stage 3 the reference path moves in aNorth East direction. The corresponding produc-tion in the paths being tested in the image is un-penalised as it corresponds to no shape change be-tween reference and image paths. To permit someelasticity in the match, and to enable the algorithmto deal with noise (and to a certain extent obscura-tion) four other productions are allowed as shown.These productions correspond to Euclidean shapedistortions between reference and image paths andare penalised. Depending on the path taken by thereference at each stage the choice of productions andpenalties are changed. The five chosen productionsgive approximately the same degree of flexibility asis present with the three productions employed inthe ELS algorithm.

o o1

o o—•# o oPath in reference

o o o o

o o o

o o oProductions in image at stage 3

Figure 5: Relationship between productions used inFIS and the tessellation pattern of the reference. Atstage 3, productions from the pixel marked Wlead tono shape change and are unpenalised, while produc-tions from the pixel positions marked® andQleadto a distortion of the path and are penalised.

Figure 6: Original image and the corresponding ac-cumulated merit image (black denotes low m.erit) forthe FIS method when searching for a wheel shaped-feature.

148

Page 5: Comparison of Approaches to Feature DetectionComparison of Approaches to Feature Detection R.W. Series, C.J. Radford, M.J. Varga, P. Fretwell, A.C. Sleigh. RSRE, St Andrews Rd, Malvern,

QQQQQQQ

ooooooooooooooooFigure 7: Accumulated merit image with a selectionof paths superimposed.

Figure 6 demonstrates the method as applied to asearch for a wheel in a car. As may be seen the accu-mulated merit is low in regions of uniform texturein the image. In figure 7 a selection of the pathshave been traced back. It may be seen that in re-gions of low intensity gradient in the image (e.g. theroad) the paths are mainly determined by the tes-sellation pattern of the reference. Under these cir-cumstances the accumulated merit then representsthe integral of the local cost function along the line.This approximates to zero. In contrast, in regionsof high image detail the paths become distorted toflow along the high gradient regions in the image.This may be seen in the paths near the bonnett and,though not visible in the figure, in most of the pathsover the car. In these paths the accumulated meritis now a combination of the integral of the local costfunction and the path distortion penalties.

In comparing the underlying mathematics of theELS and FIS algorithms, it should be noted that inthe former we are maximising line integrals whosepath is constrained by the edge finder output, whilein the FIS algorithm we compute the line inte-gral along lines which are less heavily constrained.We would therefore expect that, other things beingequal, the FIS algorithm should yield better results.

DISCUSSION

Each method has it's own strengths and weaknesses.

The Hough transform approach requires the imageto be reasonably free of extraneous detail for suc-cessful operation, hence the need for the fractal dis-criminant. The discriminant brings with it a num-ber of problems since it groups edge segments withno use of high level knowledge. Hence a contorted

14Q

line may abut to a smooth line and the discrimi-nant will then threshold both smooth and contortededge together. (To a certain extent it is possible tominimise this problem by selection of the intensityhysteresis thresholds but this makes the algorithmmore sensitive to image illumination and exposure.)A particular strength of the method is that wheelfeatures rarely give rise to a single isolated circle,but rather a set of concentric arcs. The Houghtransform combines the evidence from all these fea-tures into one point and is insensitive to fragmenta-tion of the image. In this respect the Hough trans-form may be considered as a form of integral over anarea. The most serious disadvantage of the methodis that it is very specific to circular features andperformance for more generalised shapes, thoughpossible in theory, tends to fail in practice.

In contrast, the ELS algorithm is capable of search-ing for a wide variety of shapes and may be tai-lored to accept a range of distortions. An importantadditional factor is that the reference shape andcost functions may be determined from appropriatetraining data. One of the major weaknesses of theELS method is its sensitivity to erroneous edge link-age at the list forming stage. Whilst the algorithmmay detect a reference shape embedded in an imageedge list, or detect a reference shape from a concate-nation of image edge lists, the algorithm often failsif there is erroneous edge linkage within the imagelists. To minimise these problems it is generallydesirable to operate the algorithm with an asym-metry between the degree of segmentation of imageand reference shape. If the image is over segmentedan under segmented reference may be used to findthe best concatenation of edge lists in the image.In the extreme the image may be treated as a col-lection of isolated edge elements and the algorithmmay then be used to find the correct concatenation.This is however achieved at the expense of greatercomputational burden. At the other extreme thereference may be under segmented and the algo-rithm may then be used to search for a larger num-ber of sub-features suitable for processing by a lateroperation.

The third algorithm, FIS, takes to the extreme theconcept of over segmentation of the image. Nothresholds are set and any pixel in the image mayplay a role in locating the feature of interest. Inthis manner the pre-commitment which is inherentin the edge extraction is avoided and all the do-main specific knowledge becomes incorporated intothe feature locating algorithm.

In all three methods, location of a single point in theimage to represent where the desired feature is mostlikely to be found poses little problem. However, itis often required to find more than one candidateposition. In the HT method use of a clustering al-

Page 6: Comparison of Approaches to Feature DetectionComparison of Approaches to Feature Detection R.W. Series, C.J. Radford, M.J. Varga, P. Fretwell, A.C. Sleigh. RSRE, St Andrews Rd, Malvern,

Figure 8: Accumulated merit plotted at the mid-pathposition for all paths in the image. Same data as forfigure 6.

production penalties are so large as to prevent anydistortion of the paths, then the mid-path array willappear exactly as the accumulated cost matrix andno focusing will be seen. However under these con-ditions the algorithm behaves as a linear methodand is unlikely to be of interest.

One problem which cuts across many pattern recog-nition algorithms is the generation of appropriatereference models and the setting of thresholds in thesystem. Here the dynamic programming algorithmsat the core of the ELS and FIS algorithm have animportant advantage over the Hough transform ap-proach because, in addition to being able to dealwith complex reference shapes it is possible to usethe Viterbi or Baum-Welch algorithm to refine ini-tial estimates of the cost function and so improvedetection efficiency.

gorithm may be used to find any desired number ofpeaks in transform space. With the two DP meth-ods however the representation is more complex andstandard clustering algorithms, if applied to the ac-cumulated cost maps, would not make use of theinformation available in the back pointer matrices.If the full paths are traced back, then it is generallyfound that the second best path differs from thebest only by one pixel. While this is formally thecorrect answer, it is often of more benefit to knowwhere the next best cluster of good paths begins.In the ELS algorithm we approach this problem bylooking for paths which differ in the sequence of seg-ments which they include. This approach has beenadopted in preparation of the results presented infigure 3 and 4. It may be seen that although oneor two paths fall around each wheel, the best fewpaths embrace most of the wheel like objects in thepicture.

In the FIS algorithm the concept of edge segments isabandoned and a different technique must be triedto find the clusters of interest. One approach whichmay be followed is to note that strong features inthe image act as valleys into which the paths aredrawn. Thus, while the end points of the paths areshow a diffuse profile, the mid-points are stronglyclustered. We demonstrate this in figure 8. Hereall paths have been traced back to their mid-pointsand the merit of the best path is noted at the loca-tion of the mid-path. Unlike the accumulated costmatrix, not all locations are filled. Indeed, as fig-ure 8 demonstrates the structure is very sparse witha few intense peaks which are often localised at asingle pixel location. This provides a 'focussing' ac-tion and a very tightly clustered image is seen. Thedegree of clustering is dependent on the productionpenalties employed and the image content. If the

References

[1] Radford, C.J. Vehicle Detection in Open-World Scenes using a Hough transform Tech-nique IEE Proe 3rd Int. Conf. on Image Pro-cessing and Applications, Warwick, U.K. July1989.

[2] Canny, J . A Computational Approach toEdge Detection. IEEE Trans. P.A.M.I. Vol 8,(1986) pp 679-696.

[3] Sleigh, A.C. Segmentation and Concatena-tion of Edgel Lists by Dynamic Programmingand Stochastic Models. Alvey Vision Confer-ence 1988.

[4] Montanari, U. On the Optimal Detection ofCurves in Noisy Pictures. Comm ACM Vol 14.May 1971.

150