Morphological Scale Spaces and Associative Morphological ... · 114 Raducanu, Gr ana÷ and Albizuri...

Journal of Mathematical Imaging and Vision 19: 113–131, 2003c! 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

Morphological Scale Spaces and Associative Morphological Memories:Results on Robustness and Practical Applications

BOGDAN RADUCANU, MANUEL GRANA AND F. XABIER ALBIZURIUniversidad Pais Vasco

[email protected]

[email protected]

[email protected]

Received April 4, 2002; Revised January 31, 2003; Accepted February 3, 2003

Abstract. Associative Morphological Memories are the analogous construct to Linear Associative Memoriesdefined on the lattice algebra (R, +, !, "). They have excellent recall properties for noiseless patterns. Howeverthey suffer from the sensitivity to specific noise models, that can be characterized as erosive and dilative noise.To improve their robustness to general noise we propose a construction method that is based on the extrema pointpreservation of the Erosion/Dilation Morphological Scale Spaces. Here we report on their application to the tasksof face localization in grayscale images and appearance based visual self-localization of a mobile robot.

Keywords: heteroassociative morphological memories, face localization, self-localization, morphological scalespaces

1. Introduction

Mathematical Morphology is the most successful nonlinear approach to Image Processing [6, 7, 24, 25].There have been several attempts to produce NeuralNetwork architectures based on morphological foun-dations. The Morphological Shared-Weight NeuralNetworks (MSNN) [31] proposes a gradient descentlearning procedure for the design of the morphologi-cal filters that perform the feature extraction phase ofthe network. However, the convergence of the learn-ing algorithm is very slow, only suitable for the de-sign of small structural elements. Other approach tothe definition of Morphological Neural Networks fol-lows the analogy with the Linear Associative Memo-ries (LAMs) [11] which are the antecedent of the wellknown Autoassociative Hopfield Memories [9]. Asso-ciative Morphological Memories [18, 19, 27] are basedon the notation and ideas of Image Algebra [21]. Theyare constructed like the LAM exchanging the conven-tional matrix product by the Min/Max matrix product.

This approach does not have learning convergence dif-ficulties of the MSNN, and possess excellent propertiesin terms of the perfect recall of stored patterns. Whilethe CAM require the orthogonality of the patterns toachieve perfect recall, the Autoassociative Morpholog-ical Memory (AMM) does not impose any condition forthe perfect recall of as many patterns as can be repre-sented by the network. The Heteroassociative Morpho-logical Memory (HMM) requires some mild conditionson the patterns, that can be interpreted as relaxed mor-phological orthogonality conditions, to obtain perfectrecall of the stored patterns.

Depending on the construction, the morphologicalmemory is analogous to a morphological dilation orerosion, and a duality property analogous to the dualitybetween morphological erosion and dilation holds forthe morphological memories. However, both HMM’sand AMM’s are highly sensitive to specific types ofnoise. We say that a pattern is affected by erosive (dila-tive) noise if the values of the corrupted pattern arebelow (above) the original ones. Dilative memories are

114 Raducanu, Grana and Albizuri

sensitive to erosive noise, whilst they are robust to dila-tive noise. The dual applies to the erosive memories.To obtain general robustness of AMM to erosive anddilative noise, the kernel method has been proposed[18, 19, 27]. In this method the construction of a robustAMM is decomposed into two steps. For the first stepeach input pattern is characterized by a specific ker-nel pattern. Kernels must comply with some orthogo-nality conditions and be minimal. A dilative AMM isbuilt up from the kernel patterns. It is thus naturallyrobust against dilative noise, but sensitive to erosivenoise. As the kernels are minimal eroded versions ofthe original patterns, it is very unlikely that the ero-sive noise would corrupt the pattern up to the point ofdeleting the kernel, therefore this construct is also ro-bust against erosive noise: the kernel will be recoveredfrom erosive and dilative perturbations of the originalpattern. The second step is an erosive HMM that recallsthe output associated to the original input pattern. Themain drawback of this approach is the lack of simpleprocedure for the definition of the kernels, despite theadvances made in this direction [20, 27], and the de-pendence on a specific kernel. Also the constructionof the AMM imposes extremely high storage/memoryand processor demands for computer vision and im-age analysis applications. The size of the AMM matrixgrows quadratically with the size of the patterns to bestored. In image applications this size becomes in theorder of 109 for small size images (320 # 240).

Because we are interested in applications that mayinvolve conventional size images, the computationalrestriction is of importance for us. To define a proce-dure that can be applied to moderate size images (i.e.,300 # 200 pixels or greater) we focus on enhancingthe HMM robustness without resort to an intermediateAMM. The size of the HMM matrix is the product ofthe sizes of the input and output vectors. In the appli-cations discussed in this paper, the size of the outputvector will be the number of classes to be distinguished,which is much smaller than the size of the input pattern.The approach taken is that of constructing the HMMwith the eroded/dilated versions of the input at one orseveral scales in a Erosion-Dilation Scale-Space frame-work [10]. The morphological Erosion and Dilationoperators using structural elements that comply withspecific conditions, preserve the local extrema of theimage intensity function. These local extrema may playthe role of pseudo-kernels for robust recalling. Theimages in the Scale-Space preserve the recalling prop-erties of the original HMM construct adding some de-

grees of general noise robustness. We study proceduresbased on HMM and Erosion/Dilation Scale-Spaces fortwo tasks: face localization in gray scale images and vi-sual landmark detection for self-localization in mobilerobots.

Face detection can be defined as the problem of de-ciding the presence of a face in the image. Face localiza-tion gives the actual position of the hypotetical face(s)in the image. Face localization is usually transformedinto a sequence of face detection tests performed overa sliding window that moves over the image. Holisticor appearance based approaches to both face detectionand localization are either based on Principal Compo-nents Analysis (PCA) [28] or Neural Networks [22, 26]and [12]. Geometrical approaches try to fit an ellipseto the face contour [30] or to detect some face ele-ments and verify their relative distances. Approachesbased on color processing [32] are very easy to realize,although prone to give high false positives rates. A sen-sible approach to more robust face localization is thecombination of several methods into a multi-cue sys-tem [4] and [29]. In this spirit, we propose the detectionwith HMM as a complementary detection tool. In thedesign of face detection algorithms it is customary totrain the system with positive and negative face cases,often applying some bootstrap procedure [22] to de-crease the false positive rate. In our approach trainingconsist in the selection of a set of face patterns, and theconstruction of the HMM.

Self-Localization is the ability to determine the spa-tial position and orientation of a robot using the infor-mation provided by its sensors [1–3, 5, 13–15, 17, 23].Visual self-localization based on the images providedby on-board cameras is usually based on the detectionof some predetermined landmarks [2, 13, 17] specif-ically designed to be easily recognized in real time.The goal of our work is the appearance based recogni-tion, with some degree of robustness, of the landmarkimages obtained from different robot placements andorientations, in an indoor environment. The recogni-tion must cope with some variations in illuminationand small rotations and translations of the images dueto the uncertainty of the robot position, which, in itsturn, is due to the uncertainties in the motion of therobot. The landmark images are identified using or-thogonal binary codes. Thus, the HMM take as inputan image of an indoor scene and gives as output a bi-nary vector that identifies the view. Here we present anautomated scheme based on HMM to detect the mostsalient views in an image sequence. These images are

Morphological Scale Spaces and Associative Morphological Memories 115

assumed as landmarks that may be robustly recognizedfrom robot physical positions near the robot physicalposition that produces the landmark.

The paper is structured as follows. In Section 2 wereview the definition and properties of a Morphologi-cal Erosion-Dilation Scale-Space. In Section 3 we re-view the formal definition of HMM together with theirmain properties. Section 4 introduces the notion ofthe pseudokernels and define its role to obtain robustHMMs. We present the results that justify the use ofthe scale-space approach in the construction of the ro-bust HMMs. In Sections 5 and 6 we present the pro-posed procedures and results on face localization andself-localization. Finally, in Section 7, we present ourconclusions and future work.

2. Morphological Scale-Space

Scale-space theory deals with the formal definition ofthe concept ‘scale’ in terms of signals/images, i.e., howwe represent the data at a given scale and how we relateimage features from one scale to another. A basic req-uisite for a particular collection of increasingly smoothimages to be a scale-space is the causality property:every feature in a coarse scale must have a correspon-dent feature in a fine scale. Causality means that theimage features are not generated as a by-product of theScale-Space generation process.

We will focus on the Morphological Erosion-Dilation Scale-Space proposed by Jackway [10], be-cause the natural features in this Scale-Space are ofinterest for HMM. In the following definitions, we as-sume that f is the original function (i.e., a grayscaleimage) and g is the structuring function, namely f :D $ R2 % R and g : E $ R2 % R. The morpholog-ical grayscale dilation operator is defined as:

( f & g)(x) = supt'E

{ f (x ( t) + g(t)} (1)

and the erosion operator as:

( f ) g)(x) = inft'E

{ f (x + t) ( g(t)}. (2)

In order to avoid domain and image shifts of the filteredfunction the structuring functions must comply with thefollowing two conditions:

supt'E

{g(t)} = 0,

(3)g(0) = 0.

We will call zero-shift structural elements the struc-turing functions that comply with conditions (3). Thestructuring functions are scale dependent: g! : E! $R2 % R is of the form

g! (x) = |! |g(|! |(1x)

A suitable scale-space structuring function is the sphereof radius ! defined by the following equation:

g! (x) = |! |((1 ( *x/!*2)1/2 ( 1), *x* + !. (4)

The support of the sphere is B! : a 2D circle of radius! . The multiscale dilation-erosion operator is definedas:

( f ,- g! )(x) =

!

"

#

"

$

( f & g! )(x) if ! > 0,

f (x) if ! = 0,

( f ) g(! (x) if ! < 0.

(5)

For positive scales (! > 0), the operation corre-sponds to a dilation, and for negative scales (! < 0),the operation corresponds to an erosion. As |! | in-creases, the filtered image tends to have less details.When |! | % 0, the filtered image converges to theoriginal one. The erosion-dilation morphological scale-space can be defined as F : D $ R2 #R % R, whereF is given by:

F(x, ! ) = ( f ,- g! )(x) (6)

The image features of interest in the morphologicalerosion-dilation scale-space are the local extrema ofthe intensity function. Let us define the following pointsets:

Emax( f ) = {x : f (x) is a local maximum} (7)

and

Emin( f ) = {x : f (x) is a local minimum}. (8)

The reason to consider the local extrema as the im-age features is that they are preserved by the mul-tiscale erosion-dilation operator with with zero-shiftstructural elements, the ones that comply with (3). Thecausality property of morphological Erosion/DilationScale-Spaces takes the shape of the following theorem,proven in Jackway and Deriche [10]:


Theorem 1. Erosion/Dilation Scale-Space Mono-tonic Property: For any scale sequence !1 < !2 < 0< !3 < !4, the following relations hold:

Emin%

f ,- g!1

&

. Emin%

f ,- g!2

&

. Emin( f ) (9)

and

Emax%

f ,- g!4

&

. Emax%

f ,- g!3

&

. Emax( f ) (10)

This theorem implies that the image fingerprints dis-appear as the absolute value of the scale parameter in-creases. Together with condition (3) implies that theimage morphological smoothing preserves the locusof each image intensity extrema up to an scale thatwe may call its intrinsic scale. In practice, the im-age is characterized by the set of reduced fingerprints{E! ( f, g), ! ' R}:

E! ( f, g) =

!

"

#

"

$

E! ( f ,- g! )max if ! > 0,

E( f )min / E( f )max if ! = 0,

E! ( f & g! )min if ! < 0,

(11)

where

E! ( f )min ='

x | f (x) = min"'B!

{ f (x + ")}(

, (12)

E! ( f )max ='

x | f (x) = max"'B!

{ f (x + ")}(

. (13)

and B! is the circle of radius ! . The continuity theoremapplies directly to the sets of fingerprints as can be eas-ily deduced because E! ( f ,- g! )max . E( f & g! )max

for all ! > 0 and E! ( f ,- g! )min . E( f ,- g! )min forall ! < 0. Note that in the discrete case the circle B!

must be approximated by an object whose shape willdepend on the distance considered on the discrete lat-tice (a diamond, an square, or other). We call the setof local minima of intrinsic scale ! < 0, L! ( f )min, tothe set of fingerprints that exist down to this scale anddisappear for lower scales ! 0 < ! . The set of localmaxima of intrinsic scale ! > 0, L! ( f )max, is the setof fingerprints that exist up this scale and disappear forhigher scales ! 0 > ! . Formally:

L! ( f )min

= {x | x ' E! ( f )min, x /' E! 0 ( f )min,1! 0 < ! },(14)

L! ( f )max

= {x | x ' E! ( f )max, x /' E! 0 ( f )max,1! 0 > ! }.(15)

It is of critical importance for the following study ofthe properties of the HMM, to establish the followingproperties: the sets of local extrema of a given scaleare preserved by the scale space construction up to themagnitude of their intrinsic scale and the value of theimages is preserved by the scale space construction forthe sets of local extrema up to the magnitude of theirintrinsic scale. These results are modified versions ofthe continuity Theorem 1

Corollary 2. The set of fingerprints E! ( f, g) atscale ! > 0 of a Morphological Erosion-Dilation ScaleSpace constructed with zero-shift structural elementsthat comply with conditions (3) contains all the setsof local maxima of intrinsic scales ! 0 > ! : The setof fingerprints E! ( f, g) at scale ! < 0 of a Morpho-logical Erosion-Dilation Scale Space constructed withzero-shift structural elements contains all the sets oflocal minima of intrinsic scales ! 0 < ! :

E! ( f, g) 2 L! 0 ( f )max 1! 0 3 ! > 0

E! ( f, g) 2 L! 0 ( f )min 1! 0 + ! < 0

Proof: We give the proof for the local maxima, theproof for the local minima follows by duality. From thedefinition of the multiscale erosion/dilation operator,when ! > 0 we have

( f ,- g! )(x) = ( f & g! )(x)

= maxt'B!

{ f (x ( t) + g(t)}.

Because g(t) + 0, 1t by the conditions in (3), we have.

( f & g! )(x) + maxt'B!

{ f (x ( t)}.

Further, because g(0) = 0, we have

x ' E! ( f ,- g! )max 4 f (x) = maxt'B!

{ f (x ( t)}

Therefore,

x ' E! ( f ,- g! )max 4 x ' E! ( f )max

4 5! 0 3 !, x ' L! 0 ( f )max; .

We have the desired result, for ! > 0:

E! ( f, g) 2 L! 0 ( f )max,1! 0 3 !.


Corollary 3. Given a Morphological Erosion-Dilation Scale Space constructed with zero-shift struc-tural elements. The image values corresponding to lo-cal maxima of intrinsic scale ! > 0 are preserved bythe Erosion-Dilation Scale Space up to this scale:

x ' L! ( f )max 6 F(x, ! 0) = f (x),1! 0 < !.

The image values corresponding to local minima ofintrinsic scale ! < 0 are preserved by the Erosion-Dilation Scale Space down to this scale:

x ' L! ( f )min 6 F(x, ! 0) = f (x),1! 0 > !.

Proof: We give the proof for the local maxima, theproof for the local minima follows by duality. By defi-nition of the scale space images, if ! > 0 then

F(x, ! 0) = ( f & g! 0 )(x)

= maxt'B! 0

{ f (x ( t) + g(t)}

+ maxt'B! 0

{ f (x ( t)}

For the points in the set of local extrema of intrinsicscale ! , it holds:

x ' L! ( f )max 6 f (x) = maxt'B!

{ f (x ( t)}

6 1! 0 < ! ; f (x) = maxt'B! 0

{ f (x ( t)}

Therefore, the image value of the points in L! ( f )max

are not cahnged by the dilation at lower scales:

x ' L! ( f )max 6 1! 0 < ! ; f (x) = F(x, ! 0)

3. Heteroassociative Morphological Memories

The work on Associative Morphological Memoriesstems from the consideration of the lattice algebraic(R,!,", +) as the alternative to the algebraic (R, +, ·)framework for the definition of Neural Networkscomputation [18, 19]. In the context of the presentwork, we consider the lattice algebra on the integernumbers: (Z,!,", +). The operators ! and " de-note, respectively, the discrete max and min opera-tors. The approach is termed morphological neuralnetworks because ! and " correspond to the mor-phological dilation and erosion operators, respectively.Given a set of input/output pairs of pattern (X, Y ) =

{(x# , y# ); # = 1, . . . , k}, an heteroassociative neu-ral network based on the pattern’s cross-correlation[9, 11] is built up as W =

)

# y# · (x# )0. Mim-icking this construction [18, 19] propose the follow-ing constructions of heteroassociative morphologicalmemories:

WXY =k

*

#=1

[y# # ((x# )0] MXY =k

+

#=1

[y# # ((x# )0]

(16)

where # is any of " or !. The operations " and !denote the max and min matrix product, respectively,which are defined as follows:

C = A ! B = [ci j ] 4 ci j = maxk=1..n

{aik + bkj },

(17)C = A " B = [ci j ] 4 ci j = min

k=1..n{aik + bkj }.

(18)

It follows that the weight matrices WXY and MXY

are lower and upper bounds of the Max/Min ma-trix products 1# ; WXY + y# # ((x# )0 + MXY . There-fore the following bounds on the output patternshold 1# ; WXY ! x# + y# + MXY " x# , which can berewritten WXY ! X + Y + MXY " X .

A matrix A is a !-perfect ("-perfect) memory for(X, Y ) if A! X = Y (A" X = Y ). It can be proven thatif A and B are!-perfect and"-perfect memories, resp.,for (X, Y), then WXY and MXY are also !-perfect and"-perfect, resp.: A + WXY + MXY + B. ThereforeWXY ! X = Y = MXY " X. Conditions of perfectrecall of the stored patterns are given by the followingtheorem [18, 19]:

Theorem 4. (Perfect recall of HMM). The matrixWXY is !-perfect if and only if 1$ the matrix [y$ #((x$ )0] ( WXY contains a zero at each row. Similarly,the matrix MXY is "-perfect if and only if 1$ the ma-trix [y$ # ((x$ )0]( MXY contains a zero at each row.These conditions are formulated for WXY as follows:

1$ 1i 5 j ; x$j ( y$

i =k

+

#=1

%

x#j ( y#

i

&

, (19)


and for MXY as follows:

1$ 1i 5 j ; x$j ( y$

i =k

*

#=1

%

x#j ( y#

i

&

. (20)

This result holds when we try to recover the outputpatterns from the noise-free input pattern. To charac-terize the response under noisy conditions, a specialdefinition of the kinds of noise affecting the input pat-terns is needed. Let it be x$ a noisy version of x$ . Ifx$ + x$ then x$ is an eroded version of x$ , alterna-tively we say that x$ is subjected to erosive noise. Ifx$ 3 x$ then x$ is a dilated version of x$ , alterna-tively we say that x$ is subjected to dilative noise. Theconditions of robust perfect recall, i.e. the retrieval ofy$ given a noisy copy x$ , are given by the followingtheorem [18, 19].

Theorem 5. (Robust perfect recall of HMM). Giveninput/output pairs (X, Y ), the equality WXY ! x$ = y$

holds when the following relations are fulfilled:

1 j ; x$j + x$

j !*

i

,

+

# 7=$

%

y$i ( y#

i + x#j

&

-

1i ; 5 ji ; x$ji = x$

ji !,

+

# 7=$

%

y$i ( y#

i + x#ji

&

-

(21)

Similarly for the conditions of robust perfect recall forMXY given a noisy copy x$ of x$ , that is, the equalityMXY ! x$ = y$ holds when:

1 j ; x$j 3 x$

j "+

i

,

*

# 7=$

%

y$i ( y#

i + x#j

&

-

1i, 5 ji ; x$ji = x$

ji ",

*

# 7=$

%

y$i ( y#

i + x#ji

&

-

(22)

Conditions (21) and (22) state that the memory WXY

is robust against controlled erosions of the input pat-terns while the associative memory matrix MXY is ro-bust against controlled dilations of the input patterns.These conditions are specialized for AutoassociativeMemories by Sussner [27] to obtain conditions for thedefinition of kernels. Since we are not dealing withAMM, we will not review them here.

4. Pseudo-Kernels and Robust Recallof the HMM

In essence, disregarding the effects of the output com-ponents y$

i , the condition for perfect recall of (x$ , y$ )is that at least one of the x$

i is the maximum/minimumin position i across all patterns # . We call the pseudo-kernel of a pattern the set of indices that possess thisquality. We will relate these pseudo-kernels with theErosion/Dilation Scale-Space properties, to constructHMM with enhanced recall robustness to general noise.The ideas discussed here are closely related to the mor-phological independence and morphological strong in-dependence introduced in Ritter et al. [20] for thedesign of kernels for morphological autoassociativememories.

We will focus on the special case of the output vec-tors being orthogonal binary codes, i.e. y# · y$ = %#$

and y# , y$ ' {0, 1}k, $ , # = 1, . . . , k. We will assumea coding of the form

%

y## = 1, y#

$ = 0, 1$ 7= #&

. (23)

This specific definition of the output patterns is ap-propriate for classification tasks in which the outputidentifies the class of the input. The input pattern com-ponents are restricted to the set of integer numbers.

Definition 1. The dilative pseudo-kernel of a patternpair (x$ , y$ ) included in a collection of patterns (X, Y ),where the output patterns are orthogonal binary vectors,is the set of indices:

KW (x$ ) =.

j/

/ x$j > x#

j , 1# 7= $0

.

The erosive pseudo-kernel of a pattern pair (x# , y# ) isthe set of indices:

KM (x$ ) =.

j/

/ x$j < x#

j , 1# 7= $0

. (24)

The pseudo-kernel definition is intended to capture theset of indices which ensure that the input pattern x#

and its corresponding output y# can be stored in and re-trieved from an HMM built on (X, Y ). Dilative pseudo-kernels are related to the WXY memory, while erosivepseudo-kernels are related to the MXY memory. A de-sirable result would be that of the existence of thesekernels being a sufficient and necessary condition forthe perfect recall of HMM constructed on (X, Y ). How-ever, we are only able to prove in the following corol-lary of theorem 4 that non empty pseudo-kernels are


a sufficient condition for the construction of perfectrecall HMM.

Corollary 6. Given input-output pairs (X, Y ) withthe output patterns being a set of orthogonal binaryvectors, if the dilative and erosive pseudo-kernels arenon empty:

1$ , KW (x$ ) 7= 8, (25)

1$ , KM (x$ ) 7= 8. (26)

then memories WXY and MXY , respectively, exhibit per-fect recall.

Proof: If the output patterns Y are a collection oforthogonal binay patterns the condition (19) for per-fect recall of the WXY matrix splits into two cases de-pending on the component of the output vector beingconsidered.

Case 1: i 7= $ By definition of pseudo-kernel

KW (x$ ) 7= 8 4 5 j with x$j > x#

j 1# 7= $

6 5 j with x$j 3

k+

#=1# 7=i

x#j .

In fact, we have x$j =

1k

#=1# 7=i

x#j , and, furthermore,

since KW (x$ ) 7= 8,

x$j > xi

j for some j. (27)

Since (27) implies x$j 3 xi

j ( 1, we obtain

1i 7= $ ; 5 j ; x$j =

k+

#=1# 7=i

%

x#j

&

!%

xij ( 1

&

, (28)

which is equivalent to the condition of perfect recall

1i 7= $ ; 5 j ; x$j =

k+

#=1

%

x#j ( y#

i

&

, (29)

obtained taking into account that y$i = 0 because of

the ouput orthogonal coding (23).

Case 2: i = $ The second case applies to the singlecomponent of the output vector that must be one. Bythe orthogonality of the output vectors, the outputcomponent of the patterns y# such that # 7= $ mustbe zero. To prove the corollary in this case, it isenough to show that

x$j ( 1 3

k+

#=1# 7=$

x#j . (30)

Since KW (x$ ) 7= 8, then 5 j such that

x$j >

k+

#=1# 7=$

x#j , (31)

Since x#j ' Z, this implies

x$j ( 1 3

k+

#=1# 7=$

x#j . (32)

For the MXY memories, the proof follows by dualitywith WXY memories.

Therefore, in order that an image and output codepair (x# , y# ) can be stored in a WXY memory, so thatthe output y# can be perfectly recalled by WXY ! x# itsuffices to have some pixel positions whose values aregreater in this image than the maximum value foundon these pixel positions in all the remaining images.We may read the above result as follows: if there is any# 7= $ such that x$ < x# ( 1, then (x$ , y$ ) can not berecalled from a WXY memory that also stores (x# , y# ).The dual assertion is true for MXY memory. In orderfor the pair (x$ , y$ ). to be stored in MXY and the outputy$ recalled by MXY " x$ , then there must exist somepixels whose values are lower in this image than theminimum value found in the remaining images. Robustrecall in the presence of noise can be also related to thepseudo-kernel existence:

Corollary 7. Given (X, Y ) input-output pairs, withthe output being a set of orthogonal binary vectors,which satisfy the conditions of Corollary 6. Let x$ bea noisy copy of x$ . If x$ 3 x$ and 5 j ' KW (x$ ) suchthat x$

j = x$j , then the conditions (21) for WXY ! x$ =

y$ are fulfilled. If x$ + x$ and 5 j ' KM (x$ ) such that


x$j = x$

j , then the conditions (22) for MXY ! x$ = y$

are fulfilled.

Proof: We will give the proof for the WXY memo-ries. The first part of the conditions (21) follows di-rectly from the condition on the noisy pattern: x$ + x$

implies 1 j ; x$j + x$

j !2

i (1

# 7=$ (y$i ( y#

i + x#j )) re-

gardless of the value of the output vectors. The secondpart of the conditions (21) is less evident. Let us recallit:

1i ; 5 ji ; x$ji = x$

ji !,

+

# 7=$

%

y$i ( y#

i + x#ji

&

-

(33)

From the definition of the output patterns as orthogonalbinary vectors we have:

+

# 7=$

%

y$i ( y#

i + x#j

&

=

!

"

"

"

#

"

"

"

$

+

# 7=$

%

1 + x#j

&

i = $

+

# 7=$# 7=i

%

x#j !

%

(1 + xij

&&

i 7= $

(34)

From the definition of the dilative pseudo-kernel

KW (x$ ) 7= 8 6 5 j ; 1# 7= $ , x$j > x#

j . (35)

For all j ' KW (x$ ), we have

x$j 3

k+

#=1# 7=$

%

1 + x#j

&

(36)

and

1i 7= $ ; x$j 3

k+

#=1# 7=$

%

x#j

&

!%

(1 + x#j

&

. (37)

Since KW (x$ ) 7= 8 we obtain in view of (34):

5 j ' KW (x$ ) 6 1i, 5 j ; x$j 3

+

# 7=$

%

y$i ( y#

i + x#j

&

.

And finally we come to the desired conclusion:

5 j ' KW (x$ ); x$j = x$

j 6 1i ; 5 ji ; x$ji

= x$ji !

,

+

# 7=$

%

y$i ( y#

i + x#yi

&

-

The dual reasoning applies to the MXY memory.

Our approach to obtain HMM with robust recall in thepresence of noise with erosive and dilative components,i.e., gaussian additive noise, is based on the followingreasoning. By the continuity of the morphological Ero-sion/Dilation Scale-Space with zero-shift structural el-ements (Theorem 1), the local minimum/maximum ofthe eroded/dilated function neither are displaced norchanged in value by the application of the multiscaleerosion/dilation operator. Therefore, if we erode/dilateall the patterns with the same structural function theirrelative values will be preserved at the points were thelocal extrema of the images are placed. If we constructthe W memory with patterns dilated with a structuralfunction g of scale ! , that complies with the condi-tions 3, then we will preserve the recognition of theoriginal patterns if the dilative pseudo-kernel of thepatterns contain local maxima of intrinsic scale ! orhigher. The dual reasoning applies to the M memory.These ideas are formalized in the following definitionsand proposition:

Definition 2 (!–dilated HMM ). Let us denote the pat-terns dilated with a zero-shift structural element at scale! as x$ ,! = x$ & g! ; let us consider the set of input-output pairs (X! , Y ) = {(x$ ,! , y$ )}. The !–dilatedHMM is the the WX! ,Y memory.

Definition 3 (!–eroded HMM). Let us denote the pat-terns eroded with a zero-shift structural element at scale! as x$ ,(! = x$ ) g! ; let us consider the set of input-output pairs (X(! , Y ) = {(x$ ,(! , y$ )}. The !–erodedHMM is the MX(! ,Y memory.

Proposition 1 (Pattern recall on !–eroded and !–dilated HMM’s). Given a set of input-output patternpairs (X, Y ) such that we can construct a !-perfectWXY memory, and a scale ! . Then, the condition forperfect recall of an output pattern from the !–dilatedHMM is

WX! ,Y ! x$ = y$ if 5! 0 3 ! ;

L! 0 (x$ )max 9 KW (x$ ) 7= 8. (38)

If a " -perfect MXY memory can be built, Then, thecondition for perfect recall of an output pattern fromthe ! -eroded HMM is

MX(! ,Y " x$ = y$ if 5! 0 3 ! ;

L! 0 (x$ )min 9 KM (x$ ) 7= 8. (39)


Proof: We will present the proof for the WX! ,Y mem-ory, the proof for the MX(! ,Y memory follows by du-ality. By construction we have that

X! 3 X

therefore, we may assume that the original patternsare eroded versions of the ones used to construct theWX! ,Y . This fulfills the first condition of Corollary 7.From Corolaries 2 and 3, both the value and the positionof the local extrema of intrinsic scale ! are preservedby the scale space construction up to the magnitude oftheir scale. For all $ ' {1, . ., k}

5! 0 3 ! ; L! 0 (x$ )max 9 KW (x$ ) 7= 86 5 j ' KW (x$ ) s.t. x$ ,!

j = x$j .

Which fulfills the second condition of Corollary 7.

The construction of the dilated and eroded HMM’s,as introduced in Definitions 2 and 3, gives also a charac-terization of the noise that can be introduced in the pat-terns without affecting the recall features of the mem-ory. Let us consider the WX! ,Y memory. By Proposi-tion 8 it has the same recall behavior as WX,Y if all thepseudo-kernels contain local maxima of the image in-tensity of intrinsic scale ! or greater. Therefore, WX! ,Y

has the same robustness against erosive noise as WX,Y .Besides, the new construction has an enhanced robust-ness against dilative noise, because dilations with zero-shift structural elements of the original patterns up toscale ! are in fact erosions of the dilated patterns usedto construct the WX! ,Y . The dual reasoning applies tothe MX(! ,Y memory. This is formalized in the followingpropositions:

Proposition 2 (Robustness of the !–dilated HMM todilative noise). Let be x$ 3 x$ such that x$ + x$ ,!

and 5 j ' L! (x$ )max 9 KW (x$ ) such that x$j = x$

j .Then, WX! ,Y ! x$ = y$ .

Proof: Like in Proposition 8, the first condition ofCorollary 7 is ensured by the condition x$ 3 x$ ,! inthe proposition. The second condition in Corollary 7follows from 5 j ' L! (x$ )max9KW (x$ ) such that x$

j =x$

j .

Proposition 3 (Robustness of the !–eroded HMM toerosive noise). Let be x$ + x$ such that x$ 3 x$ ,(!

and x$j = x$

j , for some j ' L(! (x$ )min 9 KM (x$ ).Then, MX(! ,Y " x$ = y$ .

Proof: Follows by duality of M and W memories asthe previous proposition.

5. Experiments on Face Localization

The first experimental setting is that of face localiza-tion. Face localization will be performed as follows:

1. A window of the same size as the training patternsis displaced over the image. At each window po-sition, the corresponding subimage is extracted forprocessing.

2. The detection test is applied to the subimage.3. If the detection is positive the subimage pixels are

labeled as face pixels.

In this section we present the experimental data, theHMM and PCA implementations of face detection testand the results obtained on the used data.

5.1. The Experimental Data

For these experiments we have used a collection of 20images taken in our laboratories with a conventionaldigital camera (Apple QuickTake). These images con-tain some 40 frontal faces with slightly varying posesand illumination conditions. The distance of subjectsto the camera roughly the same for all shots. Fromthese images, we have extracted a set of face patterns,shown in Fig. 1, that will serve as the training set forthe two approaches tested, to enforce some fairness in

Figure 1. Face Patterns used as training patterns for the constructionof the HMM and the PCA algorithms for face localization.


the comparison. This set of training face patterns hasthe following features:

1. Faces are of roughly the same scale, but there areslight variations in size. We have not enforced anysize of the face as long as it enters in the imagewindow.

2. We have removed the background in these patternsto eliminate the effect of the surroundings.. Back-ground removal in the face patterns is customarilyperformed in face localization systems i.e. [22, 26]

3. There is no geometrical registration of face featuresor any other geometrical transformation intendedto obtain some normalized aspect of the faces (i.e.,some of the faces are rotated), and

4. There is no intensity normalization, equalization orany other illumination compensation.

Therefore, building this set of patterns correspondsto an almost casual browsing and picking of face im-ages in the database. The validation of the face local-ization procedures (either HMMs and PCA) is doneagainst a manual labelling of the face pixels in the train-ing and validation images, which is independent of theselection of the face patterns. Manual labeling on thewas done drawing a rectangle that encloses the face.For each image fi ; i = 1, . ., m there is a correspond-ing set of pixels li = {(x, y) s.t. pixel fi (x, y) belongsto a labeled face}; i = 1, . ., m

Because our ground truth for validation are face pix-els, the face localization process composes for eachimage { fi ; i = 1, . ., m} a corresponding set of pix-els bi = {(x, y) s.t. pixel fi (x, y) detect. face pixel};i = 1, . ., m. True Positive face localization is com-puted as the percentage of pixels declared by the facelocalization procedure as face pixels that correspond tomanually labeled face pixels: TPi = #(li9bi )

#li. False Pos-

itive face localization is computed as the percentage ofpixels declared as face pixels that do not correspond tomanually labeled face pixels: FPi = #(bi(li )

#li.

5.2. HMM for Face Localization

We don’t know beforehand the maximum scale atwhich the eroded and dilated patterns preserve theirpseudo kernels and perform the correct recall from thenoisy input. Besides, this scale may vary from subim-age position and be different for the erosive and dilativeHMM’s. To cope with these variabilities we introducethe idea of Multiscale HMM:

Definition 4. Given a set of input-output patternpairs (X, Y ), a Multiscale HMM is a set of HMM{MX(! Y WX! Y ; ! 3 0} where MX(! Y and WX! Y are,resp., !–eroded and !–dilated HMM’s as in defini-tions 2 and 3. In practice, the scale will be picked froma discrete set ! ' {1, 2, . . . s}.1 Given a test input pat-tern x, the memories at the different scales are appliedgiving a collection of responses

YM =.

yM! = MX(! Y " x; ! ' {1, 2, . . . , s}

0

,

and

YW =.

yM! = M!

X! Y ! x; ! ' {1, 2, . . . , s}0

.

The total output could considered as the intersection ofthese multiscale responses:

Y = YM 9 YW . (40)

If a single output vector y is sought, it can be the chosenas the most frequent vector in Y.

To produce a set of detected face pixels, we haverealized the multiscale HMM as follows: In face local-ization, the basic set of input patterns X of the HMM’sis the set of training face patterns such as the ones inFig. 1. The output patterns Y of the HMM’s are theorthogonal binary encoding of the faces, constructedas described by Eq. (23). Each face pattern is a classby itself. From the point of view of pattern recognitionthe approach is similar to a Nearest Neighbor classi-fication, with only one sample per class. We did notapply any bootstrap procedure as in Rowley et al. [22]although it could be done if the addition of new pat-terns to the HMM’s does not interfere with the recall ofthe previously stored patterns. The process at the pixellevel is as follows:

1. We consider two pixel sets bW and bM that willrepresent the responses of the W and M memories.

2. For each image window w f of size N # M aroundpixel fi (x, y) taken as the test input x

3. For each ! ' {1, 2, . . . s}(a) Compute the yM

! and yW! responses

(b) If yM! ' Y then add pixels {(x ( i, y ( j); i =

( N2 , . . , N

2 , j = (M2 , . . , M

2 } to bM .

(c) If yW! ' Y then add pixels {(x ( i, y ( j); i =

( N2 , . . , N

2 , j = (M2 , . . , M

2 } to bM .

4. The final response of the face localization withHMM in image fi is the intersection of the responsesto erosive and dilative memories: bi = bM 9 bW .


5.3. PCA for Face Localization

For comparison we have implemented as well the Prin-cipal Component Analysis (PCA) approach to facedetection, which consists in performing a thresholddecision [28] based on the computation of the dis-tance to face space. Briefly described, given a sample{x1, . ., xn} of face patterns (i.e., the ones in Fig. 1),we compute the sample average face x = 1

n

)ni=1 xi ,

(Fig. 3) and the correlation matrix C =)n

i=1(xi ( x).The diagonalization of the correlation matrix is givenby: C = &'&, where & is the orthonormal eigenvec-tors matrix and ' is the diagonal eigenvalues matrix.Eigenvectors are ordered in decreasing magnitude oftheir associated eigenvalues. In the case of face pat-terns, the eigenvectors are called eigenfaces becauseof their ghostly aspect, like the ones in Fig. 2 com-puted from the patterns in Fig. 1. For a small sam-ple of size n, only n eigenvectors will be meaningful.The PCA transformation of a data vector x into thesubspace defined by the m most significative eigen-vectors is given by y = &t

m(x ( x), where &tm is the

matrix constructed with the m principal eigenvectors.The distance to face space is the reconstruction error:dx,&m (x) = *x( x*2 where the image reconstruction iscomputed as x = &my + x. The detection is declaredpositive when dx,&m (x) < ( , for a threshold ( that maybe determined empirically. An approach to determine (

uses the Receiver Operator Characteristic (ROC) curvewhich plots, for varying values of ( , the average truepositive TP (( ) versus the false positive FP (( ) detec-tion ratios computed on a set of images. The value of ( isset as the maximum value that gives a desired true pos-itive detection ratio or a desired false positive detectionratio.

Figure 2. Eigenfaces obtained from the patterns in Fig. 1.

The set of localized face pixels, for a fixed valueof ( , is computed as follows. For each window w ofsize N # M around pixel fi (x, y), we compute its dis-tance to face space. If dx,&m (w) < ( , then add pixels{(x ( i, y ( j); i = ( N

2 , . . , N2 , j = (M

2 , . . , M2 } to

bi .

5.4. Results and Discussion

The computational experiment consist of the face lo-calization at the pixel level over the in-house collectionof images (20 images, 40 faces present), with the PCAand HMM algorithms, using the faces in Fig. 1 as thetraining patterns in both algorithms. The ROC curve ofboth algorithms, averaged over all the images, is pre-sented in Fig. 4. The ROC curve of the HMM needs anexplanation. The face detection decision does not de-pend on a distance threshold. The maximum scale s ofthe Multiscale HMM is the only parameter that may beadjusted to affect the positive and negative detection ra-tios. Increasing s we obtain that the positive detectionsgrows up to 100% at the cost of increasing the falsepositive detection ratio. We only have a discrete set ofvalues for this parameter, and, therefore, the resultingROC curve has a reduced set of points (represented bysmall circles in Fig. 4). The curve starts with s = 2. Thenext values s = 3, s = 4 and s = 5 produced a stepincrease of the positive detection ratio with marginalincreases of the negative detection ratio. With a values = 8 we obtain a 100% of positive detections, with10% of false positives. A value s = 5 gives a positivedetection ratio above the 80% and a negative detectionratio below 5%.

Figure 5 shows the face localization results withMultiscale HMM of maximum scale s = 5 on some ofthe images. White pixels correspond to pixels declaredas face pixels. The localization corresponds to the inter-section of the independent localization with the erosiveand dilative HMM. It is possible that these independentlocalizations do not occur at the same exact position,in this case the overlapping may be irregular. In thiscase, some features of the face may be left out of thedetected regions. In the figures some of the eyes arenot declared as face pixels. Note that all these casesshow a very bright spot corresponding to a reflex onthe glasses. Also, the bearded mouth of one subjectis not recognized as face pixels. However, it must benoted that despite these anomalies, some parts of thefaces were detected. False positives correspond to re-gions of the image that comply with the conditions for


Figure 3. Mean face image computed from the patterns in Fig. 1.

recognition. It will be desirable to develop heuristicrules to filter out the false positive pixels, however wewill not pursue them here.

The aim of this paper is to show that the HMM havea potential for pattern localization comparable to otherwell know approaches. We have applied the distanceto feature space approach to face localization in theimages in our collection, based on the PCA obtainedfrom the patterns in Fig. 1. The eigenfaces and meanface are those in Figs. 2 and 3. The averaged ROC curveof the PCA is presented in 4. It lies well below the ROCcorresponding to the HMM: false positive detectionratio is much bigger for the PCA approach than forthe HMM approach when the same positive detectionis required. Setting the decision threshold ( to a valuethat ensures a 80% average positive detection ratio,we obtain the face localization results for the images inFig. 5 and present them in Fig. 6. Here the detected facepixels are highlighted instead of being set to white. Thefirst difference between the detection in Fig. 6 versus 5is that some of the faces are completely detected whilesome others are not detected at all. Only one of the facesshow a partial detection. The false positive detection

Figure 4. Mean ROC curves for the face pixel localization with thePCA and the HMM constructed on the basis of the patterns in Fig. 1.

pixels tend to cover broad smooth regions larger thanthe ones in the HMM results. However, PCA resultsdo not show the irregularities of the HMM results. It ispossible that heuristic rules added to the PCA analysismay reduce the false positive detections, but, as withthe HMM, we will not pursue them here. We thinkthat these results show that (Multiscale) HMM’s arepotential tool for face localization and similar tasks.

It may be argued that the PCA needs well registeredpatterns to compute a reasonable set of eigenfaces. Thepatterns in Fig. 1 have all non face features removed, butdo not have a normalized position of the eyes, nose andmouth. This lack of preprocessing of the face patternsis a handicap for both approaches, because the HMMmay also benefit from the registration to normalizedpositions of the face features: either by improving thedetection ratio or by reducing the number of patternsthat must be stored (in case that the registration makesthe patterns indistinguishable by the disappearance ofthe pseudo-kernels).

6. Experiments on Self-Localization

In this section we discuss a procedure for mobilerobot visual Self-Localization using HMMs for the ap-pearance based recognition of visual landmarks. Vi-sual self-localization has two conflicting robustnessrequirements. On one hand, the landmark recognitionmust be robust enough to be insensitive to small varia-tions in position and orientation, and even in illumina-tion conditions. Our previous works [8] and [16] wereaddressed to test the robustness of the HMM to smalltranslations and rotations of the stored views. On theother hand, it must be stringent to avoid confusion oflandmarks, and to allow for a precise map of the phys-ical positions and orientations into visual landmarks.The robustness of the recognition determines the uncer-tainty of the self-localization. Robust landmark recog-nition implies coarse localization and high uncertainty,stringent landmark recognition implies fine localiza-tion with low uncertainty, but also greater computa-tional requirements. A possible outcome of the land-mark recognition is the “unknown” position, when noone of the landmarks matches the actual view. Robustlandmark recognition reduces the unknown situationswhile stringent landmark recognition increases them.Navigation based on landmark recognition is a pro-cess in which the mobile robot continuously tries tomatch the view with the stored landmarks, if there isno match the robot traverses an “unchartered” region of


Figure 5. Face localization results of the HMMs on some images. White pixels are positive face pixel localizations produced by the HMM.

Figure 6. Face localization results with the PCA approach on some images. Highlighted pixels correspond to the positive face pixel localizationsproduced by the PCA.

the configuration space until arrival to a position wheresome landmark is recognized. In this region the self-localization is based on propioperception (odometry).This navigation process is similar to the navigation withan annotated map [14]. The HMM are not suitable foroutdoor landmark recognition, because they are verysensitive to the changes in intensity due to changingillumination conditions. This section is devoted to showthat the HMMs may achieve the robust recognition bal-ance required for appearance based landmark recogni-tion in indoor environments. We give also an automaticprocedure for landmark selection from an image se-

quence, which is the equivalent to a learning procedurein this context.

6.1. Experimental Data

Experiments are performed on a mobile robot B-21(iRobot Corp.). We capture from the on-board cam-era image sequences images corresponding to specificwalks around of the laboratory. We have worked with4 image sequences that correspond to 2 different walksaround the laboratory. For each walk we have captured


Figure 7. Support map for landmark candidates obtained using the HMM.

a training and a test sequence. Image sizes are 200#100pixels. Images are identified by their sequence number.We did not attach physical positions and orientations tothe images in the sequences. Therefore, the evaluationof the approach can not be given in terms of the error inphysical position. Instead, we evaluate the approach by:

1. The continuity of the landmark recognition: phys-ically close positions correspond to images withclose sequence numbers, therefore images withclose sequence numbers must be recognized as thesame landmark, and

2. The lack of confusion: separate images in the se-quence must not be assigned to the same landmark.

The requirement of lack of confusion has one excep-tion in our experiments. The walks that produced theexperimental image sequences had the same startingand ending position and orientation. Therefore, a natu-ral result in these image sequences is the confusion ofthe start and the end of the sequence: images at bothends must match the same landmark.

6.2. Landmark Selection and Recognitionwith HMMs

Given a training sequence { fi ; i = 1, . . . , n} and a setof landmark candidates {li ; i = 1, . . . , m} we performthe following process for landmark selection:

1. Each landmark candidate image li i = 1, . . . , m isused to build a ! -eroded HMM Mi ; i = 1, . . . , m

Figure 8. Images identified as landmarks in the first image sequence.

memory whose desired output is a scalar valuey = 1.

2. Each ! -eroded HMM Mi memory is applied to eachof the f j ; j = 1, . . . , n remaining images in thetraining sequence. For increased robustness eachtest image f j is dilated with an spherical structuralobject of scale ! 0 before application of the HMM.The landmark recognition corresponds to an outputMi " ( fi & g! 0 ) = yi j = 1.

3. We call the support of a landmark candidate im-age li ; i = 1, . . . , m to the set Si = { j s.t. yi j =1}. The landmarks are selected as those land-mark candidates with greater non intersectingsupports.

In the computational experiments, landmark candi-dates come from a time subsampling of the trainingsequences. We take one out of each ten images as alandmark candidate. Figure 7 shows, as binary images,the support sets {Si ; i = 1, . . . , m}, in other words, im-ages correspond to {yi j ; i = 1, . . . , m; j = 1, . . . , n}.The scales of the erosion and dilations were ! = ! 0 =6. Observe the confusion that appears at the begin-ning and end of both sequences. The images tend toshow a white diagonal, but nothing guarantees thatMi " ( fi & g! 0 ) = yii = 1, because the dilation ofthe original image may affect the conditions of recallof the image with its dilated version. However, mostof the times yii = 1 in the images. We postulate thatthis is due to the flat regions that are not affected by thedilation.

Figure 8 shows the views selected as landmarksby the procedure described above in the first train-ing sequence. Note that these views correspond to a


Figure 9. Correlation map between the landmark candidates and the remaining images in the training sequences.

non-regular sampling of the image sequence. As thedetection is performed with an M HMM, the pseudo-kernels correspond to the dark areas in the image, theminima in the intensity function. The landmark selec-tion procedure performs as an unsupervised featureselection procedure, where the features are the pseu-dokernels of each landmark image. It can be appreci-ated that the selected images correspond to translationsof these features, and that the separation between se-quence numbers of the landmarks (and hence its physi-cal distance) is proportional to the size of these features.Note also that the images are selected so that no onehas its features included in the features of other image.Finally, the these sampling of the images correspondto a fine irregular sampling of the physical positionsvisited by the robot.

6.3. Results and Discussion

It may seem that the HMM is equivalent to some kindof correlation-like distance, however things are not thatsimple. The first question we want to address is whetherthe correlation between images produces a landmarkdetection similar to (or better than) that of the HMMs.To this end we performed the same process of landmarkselection employing the correlation between images asthe distance from the landmark candidate to the testimages. In Fig. 9, we present, for both training imagesequences, the correlation of each landmark candidateto each of the remaining images in the sequence

di j = *li · f j**li* · * f j*

, i = 1, . . . , m; j = 1, . . . , n

(41)

Figure 10. Support map obtained threholding the correlation map between landmark candidates and the images in the training sequences.

as a grayscale image. We would want to select as land-marks those images with the larger support, like we didwith the HMMs. The correlation map must be binarized

bi j = di j > (, (42)

to obtain the support of each image. Thresholdingis equivalent to decide that the test image does notmatch the candidate landmark if the correlation islower than ( . Setting the value of the threshold (

is not trivial, because the distribution of the correla-tion values depends on the image sequence. Applyingthe same threshold to the correlation map of both se-quences gives catastrophic results. Performing a sep-arate thresholding of each distance map in Fig. 9 weobtain the image support maps in Fig. 10. Threshold-ing was aimed to obtain a diagonal structure of thesupport maps similar to the ones in Fig. 7. However,the support maps obtained from the correlation imagepresent a high degree of confusion that makes them use-less for landmark selection. The comparison betweenthe correlation and the HMMs ends here because wecan not extract the most salient views as landmarksbased on the correlation. We conclude that correla-tion is a much more ambiguous matching measure thanHMM.

Once selected the landmarks, we can test their recog-nition on the experimental image sequences. To vi-sualize these recognition results we present them asa plot of the recognized landmark number (the ordi-nate axis) versus the number in the sequence of thetest image (the abscissa axis). From the first sequencesof the first and second walk we obtain m = 36 andm = 35, resp., landmark images. We construct the! -eroded HMM MX(!Y with the selected landmarks


Figure 11. Landmark recognition in the training sequence of thefirst walk.

as inputs X and their orthogonal codes y$ ' {0, 1}m ,$ = 1, . . . , m, (y#

# = 1, y#$ = 0,1 $ 7= # ). as the de-

sired outputs. We compute the response to each imagein the sequence:

MX(!Y " ( f j & g! 0 ) = y j ; j = 1, . . . , n (43)

The recognition plotted in the Figs. 11, 12, 13 and14 can be formalized as follows:

R j =3

$ if 5y$ = y j

0 if ¬5y$ = y jj = 1, . . . , n (44)

Figures 11 and 12 show the plot of the landmarkrecognition R j corresponding to the training and testimage sequences of the first walk. Figures 13 and 14show the same results from the second walk. Note thatthe landmarks have been extracted in order from thetraining sequence, therefore it is natural that the plot ofR j has an staircase increasing shape. The plot appearsto be a collection of step functions of increased am-plitude. Each one of these step functions correspondsto the continuous recognition of a landmark in the se-quence. The gaps between them correspond to imageswith null recognition.

The first observation on the results refers to the im-ages in the sequence with null landmark recognitionin the training and test sequences of both walks. Com-parison of Figs. 11 and 12 reveal that the number ofnull landmark recognitions has not increased above the10% in the test sequence. HMM recognition is robustenough to cope with the noise and variations that are

Figure 12. Landmark recognition in the test sequence of the firstwalk.

Figure 13. Landmark recognition in the training sequence of thesecond walk.

Figure 14. Landmark recognition in the test sequence of the secondwalk.


introduced in the images in a second walk. The com-parison of Figs. 13 and 14 lead to similar conclusionsfrom the 2nd walk.

The second observation is relative to the constancy ofthe profiles of the recognition plots. Comparison ofFigs. 11 and 12 show that the landmarks are rec-ognized in the test sequence in places close to theones in the training sequence. The same qualitativeconclusion is valid for the second walk as can beappreciated in Figs. 13 and 14, despite some wildshots.

The next observation refers to the continuity of thelandmark recognition. There is a strong correlation be-tween the spatial (image) position and the landmarkrecognized, therefore the approach can be of use forself-localization. The exception are the middle imagesin the second walk (Figs. 13 and 14) for which recog-nition is not as good as for the remaining images andsequences. In a close examination of these images, wefound that they are jagged and very noisy, due to thewheel slippage while making turns.

The HMM landmark recognition is robust in thesense that the position uncertainty and capture noisedoes not degrade catastrophically the recognition.

7. Conclusions and Further Work

Associative Morphological Memories, either autoas-sociative (AMM) or heteroassociative (HMM), havea specific sensitivity to erosive or dilative noise,depending on their construction. In this paper we studythe construction of HMM robust against general noisefrom a specific view-point. We consider the erosive anddilative pseudo-kernels of the stored input-output pat-terns and show that they are preserved by erosion anddilation, resp., in the setting of a Erosion-Dilation Mor-phological Scale Space. Storing the eroded or dilatedpatterns instead of the original ones preserves the nat-ural robustness to erosive or dilative noise and adds adegree of robustness to the dual noise that depends onthe scale. We focus on the HMM because AMM stor-age and computation requirements for moderate sizeimages (320#240) are large for practical applications.

We propose the application of HMM for two tasks:(1) a realization of face localization that can be compet-itive with other appearance based algorithms, and (2)the self-localization of mobile robots based on visualinformation.

For the face localization task a Multiscale HMM wasconstructed from a set of face patterns, and tested the

face localization at the pixel level on a collection of im-ages with varying face poses. For comparison we buildthe eigenface transform (PCA) from the same face pat-terns and compute the face localization at the pixel levelwith the distance to feature space approach. The HMMshowed better response than the PCA approach. Thereduced set of images does not allow to make generalclaims about the superiority of the HMM approach, butwe may safely affirm that the HMM approach may bea complementary face detection test in a multicue facelocalization system. The issue of efficient and accuratemultiresolution detection of faces of different sizes isan open problem.

For the self-localization task, the robust recognitionwas achieved applying morphological erosion to theimages before constructing the M HMM and dilat-ing the test images before applying them for recog-nition. We give a method for the selection of land-mark images from a training image sequence. The testof the recognition of the selected landmark imageson the test sequences show high correlation with thespatial position and small sensitivity to the noise andposition uncertainties that naturally appear in the sec-ond walk around that produces the test images. Wethink that the method may be applied in real timemobile robots because of its speed of response, ro-bustness and ease of implementation and fitting (theonly tuning parameter is the erosion/dilation struc-tural object scale). This real-life application of the ap-proach needs an on-line procedure for the landmarkdetection.

Color processing is a powerful tool in practical vi-sual recognition procedures. Still, the morphologicaltreatment of color spaces is not well posed becauseof the lack of a meaningful partial order in most colorspaces. Dealing with the color information in the AMMor HMM will increase significatively their scope forpractical applications. For instance, color based HMMmay combine spatial and color information in the facelocalization task.

Acknowledgments

The authors received partial support from projects ofthe Gobierno Vasco (GV/EJ) UE-1999-1 and PI-98-21, and of the Ministerio de Ciencia y TecnologiaMAT1999-1049-C03-03, TIC2000-0739-C04-02 andTIC2000-0376-P4-04. B. Raducanu benefitted of apredoctoral grant from the University of The BasqueCountry (UPV/EHU).


Note

1. The storage space and computing time requirements of the Mul-tiscale HMM are proportional to the size of the input patterns,the number of classes and the number of scales considered, wellbelow to the requirements imposed by the AMM implementationin most practical cases.

References

1. A. Adorni, S. Cagnoni, S. Enderle, G.K. Kraetzschmar, M.Mordonini, M. Plagge, M. Ritter, S. Sablatnog, and A. Zell,“Vision-based localization for mobile robots,” Robotics and Au-tonomous Systems, Vol. 36, pp. 103–119, 2001.

2. C. Balkenius and L. Kopp, “Robust self-localization using elas-tic templates,” in Proc. Swedish Symp. on Image Analysis, T.Lindberg (Ed.), 1997.

3. G.N. DeSouza and A.C. Kak, “Vision for mobile robot naviga-tion: A survey,” IEEE Trans. on Patt. Anal. Mach. Int., Vol. 24,No. 2, pp. 237–267, 2002.

4. R. Feraud, O.J. Bernier, J.-E. Viallet, and M. Collobert, “A fastand accurate face detector based on neural networks,” IEEETrans. on Patt. Anal. Mach. Int., Vol. 23, No. 1, pp. 42–53, 2001.

5. D. Fox, “Markov, localization: A probabilistic framework formobile robot localization and navigation,” Ph.D. Thesis, Uni-versity of Bonn, Germany, December 1998.

6. C.R. Giardina and E.R. Dougherty, Morphological Methods inImage and Signal Processing, Prentice Hall: Englewood Cliffs,NJ, 1988.

7. R.C. Gonzalez and R.E. Woods, Digital Image Processing,Addison-Wesley, Reading, MA, 1992.

8. M. Grana and B. Raducanu, “On the application of morpholog-ical heteroassociative neural networks,” in Proc. Int. Conf. onImage Processing (ICIP), I. Pitas (Ed.), Thessaloniki, Greece,October 2001, pp. 501–504.

9. J.J. Hopfield, “Neural networks and physical systems with emer-gent collective computational abilities,” in Proc. Nat. Acad. Sci-ences, Vol. 79, pp. 2554–2558, 1982.

10. P.T. Jackway and M. Deriche, “Scale-space properties of themultiscale morphological dilation-erosion,” IEEE Trans. onPatt. Anal. and Mach. Int., Vol. 18, No. 1, pp. 38–51, 1996.

11. T. Kohonen, “Correlation matrix memory,” IEEE trans. Com-puters, Vol. 21, pp. 353–359, 1972.

12. S.H. Lin, S.Y. Kung, and L.J. Lin, “Face recognition/detectionby probabilistic decision-based neural network,” IEEE Trans. onNeural Networks, Vol. 8, No. 1, pp. 114–132, 1997.

13. S. Livatino and C. Madsen, “Optimization of robot self-localization accuracy by automatic visual-landmark selection,”in Proc. of the 11th Scandinavian Conf. on Image Analysis(SCIA), 1999, pp. 501–506,

14. A.M. Martinez and J. Vitria, Clustering in image space for placerecognition and visual annotations for human-robot interaction,”IEEE Trans. Sys. Man Cyb. B, Vol. 31, No. 5, pp. 669–682.

15. C.F. Olson, “Mobile robot self-localization by iconic match-ing of range maps,” in Proc. of the 8th Int. Conf. on AdvancedRobotics, 1997, pp. 447–452.

16. B. Raducanu, M. Grana, and P. Sussner, “Morphological neu-ral networks for vision based self-localization,” in Proc. ofICRA2001, Int. Conf. on Robotics and Automation, Seoul, Korea,May 2001, pp. 2059–2064.

17. J. Reuter, “Mobile robot self-localization using PDAB,” in Proc.of Int. Conf. on Robotics and Automation (ICRA’2000), 2000.

18. G.X. Ritter, J.L. Diaz-de-Leon, and P. Sussner, “Morphologicalbidirectional associative memories,” Neural Networks, Vol. 12,pp. 851–867, 1999.

19. G.X. Ritter, P. Sussner, and J.L. Diaz-de-Leon, “Morphologicalassociative memories,” IEEE Trans. on Neural Networks, Vol. 9,No. 2, pp. 281–292, 1998.

20. G.X. Ritter, G. Urcid, and L. Iancu, “Reconstruction of patternsfrom moisy inputs using morphological associative memories,”J. Math. Imag. Vision, 2002 submitted.

21. G.X. Ritter and J.N. Wilson, Handbook of Computer Vision Al-gorithms in Image Algebra, CRC Press: Boca Raton, FL.

22. H.A. Rowley, S. Baluja, and T. Kanade, “Neural network-basedface detection,” IEEE Trans. on Patt. Anal. and Mach. Int., Vol.20, No. 1, pp. 23–38, 1998.

23. A. Saffiotti and L.P. Wesley, “Perception-based self-localizationusing fuzzy location,” in Lecture Notes in Artificial Intelligence1093, L. Dorst, M. van Lambalgen, and F. Voorbraak (Eds.),Springer-Verlag, 1996, pp. 368–385.

24. J. Serra, Image Analysis and Mathematical Morphology,Academic Press: London, 1982.

25. P. Soille, Morphological Image Analysis. Principles and Appli-cations, Springer Verlag: Berlin, 1999.

26. K.K. Sung and T. Poggio, “Example-based learning for view-based human face detection,” IEEE Trans. on Patt. Anal. andMach. Int., Vol. 20, No. 1, pp. 39–50, 1998.

27. P. Sussner, “Observations on morphological associative memo-ries and the kernel method,” Neurocomputing, Vol. 31, pp. 167–183, 2000.

28. M. Turk and A. Pentland, “Eigenfaces for recognition,” Journalof Cognitive Neuroscience, Vol. 3, No. 1, pp. 71–86, 1991.

29. J.-G. Wang and E. Sung, “Frontal-view face detection and facialfeature extraction using color and morphological operators,”Patt. Recog. Letters, Vol. 20, No. 10, pp. 1053–1068, 1999.

30. J. Wang and T. Tan, “A new face detection method based on shapeinformation,” Patt. Recog. Letters, Vol. 21, Nos. 6/7, pp. 463–471, 2000.

31. Y. Won, P.D. Gader, and P.C. Coffield, “Morphological shared-weight neural network with applications to automatic target de-tection,” IEEE Trans. Neural Networks, Vol. 8, No. 5, pp. 1195–1203, 1997.

32. T.-W. Yoo and I.-S. Oh, “A fast algorithm for tracking humanfaces based on chromatic histograms,” Patt. Recog. Letters,Vol. 20, No. 10, pp. 967–978, 1999.

Bogdan Raducanu received his B.Sc. in computer science fromthe Politechnical University of Bucharest, Romania (1995) and aPh.D. “cum laude” from the University of The Basque Country,


Spain (2001). Currently, he is a post-doc at the department of User-Centred Engineering from the Technical University of Eindhoven,The Netherlands. His research interests include computer vision,pattern recognition, robotics and human-computer interaction.

Manuel Grana received the B.S. and M.S. degree in Computer Sci-ence from the Universidad del Pais Vasco, Spain, in 1982, and aPh.D. in Physics from the same University in 1989. He is currentlya Professor in the Department of Ciencias de la Computacion eInteligencia.

His research interests include computer vision and image process-ing, artificial neural network (Self Organizing Map and Morpholog-ical Neural Networks), Evolutionary Algorithms and autonomous

robotics. He is a member of the IEEE. He is coeditor of the book“Biologically inspired robot behavior engineering”. He has coau-thored more than 30 papers published in international journals andover a 60 contributions in international conferences and symposia.

F. Xabier Albizuri received the M.Sc. degree in Physics in 1987 andthe Ph.D. degree in Computer Science in 1995 from the University ofthe Basque Country, Spain. He is an Associate Professor at the Com-puter Science Faculty of the University of the Basque Country. Hisresearch work has developed on the areas of neural networks, pat-tern recognition and image analysis, stochastic models, analysis andperformance evaluation of communication systems. He is engagedin automation and simulation software projects.

Morphological Scale Spaces and Associative Morphological ... · 114 Raducanu, Gr ana÷ and Albizuri...

Documents

Transcript of Morphological Scale Spaces and Associative Morphological ... · 114 Raducanu, Gr ana÷ and Albizuri...