An evidence�theoretic k�NN rule with parameter
optimization�
L� M� Zouhal��y and T� Den�ux�
� Universit�e de Technologie de Compi�egne � U�M�R� CNRS ���� Heudiasyc
BP �� � F��� Compi�egne cedex � France
email� Thierry�Denoeux�hds�utc�fr
y Centre International des Techniques Informatiques
Lyonnaise des Eaux
�Technical Report Heudiasyc ������ To appear in IEEE Transactions on Systems� Man
and Cybernetics�
Abstract
This paper presents a learning procedure for optimizing the parameters in the evidence�theoretic k�nearest neighbor rule� a pattern classi�cation method based on the Dem�pster�Shafer theory of belief functions� In this approach� each neighbor of a patternto be classi�ed is considered as an item of evidence supporting certain hypothesesconcerning the class membership of that pattern� Based on this evidence� basic beliefmasses are assigned to each subset of the set of classes� Such masses are obtainedfor each of the k nearest neighbors of the pattern under consideration and aggregatedusing the Dempster�s rule of combination� In many situations� this method was foundexperimentally to yield lower error rates than other methods using the same infor�mation� However� the problem of tuning the parameters of the classi�cation rule wasso far unresolved� In this paper� we propose to determine optimal or near�optimalparameter values from the data by minimizing an error function� This re�nement ofthe original method is shown experimentally to result in substantial improvement ofclassi�cation accuracy�
� Introduction
In the classical approach to statistical pattern recognition� the entities to be classi�edare assumed to be selected by some form of random experiment� The feature vectordescribing each entity is then a random vector with well�de�ned � though unknown �probability density function depending on the pattern category� Based on these den�sities and on the prior probability of each class� posterior probabilities can be de�ned�and the optimal Bayes decision rule can then theoretically be used for classifying anarbitrary pattern with minimal expected risk� Since the class�conditional densitiesand prior probabilities are usually unknown� they need to be estimated from the data�A lot of methods have been proposed for building consistent estimators of the pos�terior probabilities under various assumptions� However� for �nite sample size� theresulting estimates generally do not provide a faithful representation of the funda�mental uncertainty pertaining to the class of a pattern to be classi�ed� For example�if only a relatively small number of training vectors is available� and a new patternis encountered that is very dissimilar from all previous patterns� the uncertainty isquite high and this situation of near�ignorance is not re�ected by the outputs of aconventional parametric or non parametric statistical classi�er� whose principle fun�damentally relies on asymptotic assumptions� This problem is particularly acute insituations in which decisions need to be made based on weak information� such ascommonly encountered in system diagnosis applications� for example�
As an attempt to provide an answer to the above problem� it was recently sug�gested to re�formulate the pattern classi�cation problem by considering the followingquestion� Given a training set of �nite size containing feature vectors with known �orpartly known classi�cation� and a suitable distance measure� how to characterize theuncertainty pertaining to the class of a new pattern In a recent paper �� � an answerto this question was proposed based on the Dempster�Shafer theory of evidence ��� �The approach consists in considering each neighbor of a pattern to be classi�ed as anitem of evidence supporting certain hypotheses concerning the class membership ofthat pattern� Based on this evidence� basic belief masses are assigned to each subsetof the set of classes� Such masses are obtained for each of the k nearest neighborsof the pattern under consideration and aggregated using the Dempster�s rule of com�bination� Given a �nite set of actions and losses associated to each action and eachclass� decisions can then be made by using some generalization of the Bayes decisiontheory�
In many situations� this method was found experimentally to yield lower errorrates than other methods based on the same information� However� the problem ofoptimizing the parameters involved in the classi�cation rule was so far unresolved�In this paper� we propose to determine optimal or near�optimal parameter valuesfrom the data by minimizing a certain error function� This re�nement of the originalmethod is shown experimentally to result in substantial improvement of classi�cationaccuracy�
The paper is organized as follows� The evidence�theoretic k�NN rule is �rst recalledin Section �� The basic concepts of the Dempster�Shafer theory are assumed to be
�
known to the reader who is invited to refer to ��� and ��� for detailed presentations�and to �� for a short introduction� Section � describes the learning procedure� as wellas an approximation to it allowing to near�optimize the error function very e�ciently�Simulation results are then presented in Section �� and Section � concludes the paper�
� The evidence�theoretic k�NN rule
We consider the problem of classifying entities into M categories or classes� The setof classes is denoted by � � f��� � � � � �Mg� The available information is assumed toconsist in a training set T � f�x���� ����� � � � � �x�N�� ��N�g of N n�dimensional pat�terns x�i�� i � �� � � � � N and their corresponding class labels� ��i�� i � �� � � � � N takingvalues in �� The similarity between patterns is assumed to be correctly measured bya certain distance function d��� ��
Let x be a new vector to be classi�ed on the basis of the information containedin T � Each pair �x�i�� ��i� constitutes a distinct item of evidence regarding the classmembership of x� If x is �close� to x�i� according to the relevant metric d� then one willbe inclined to believe that both vectors belong to the same class� On the contrary�if d�x�x�i� is very large� then the consideration of x�i� will leave us in a situationof almost complete ignorance concerning the class of x� Consequently� this item ofevidence may be postulated to induce a basic belief assignment �BBA m��jx�i� over� de�ned by�
m�f�qgjx�i� � ��q�d
�i� ��
m��jx�i� � �� ��q�d�i� ��
m�Ajx�i� � �� �A � �� n f�� f�qgg ��
where d�i� � d�x�x�i�� �q is the class of x�i� ���i� � �q� � is a parameter such that� � � � � and �q is a decreasing function verifying �q�� � � et limd�� �q�d � ��Note that m��jx�i� reduces to the vacuous belief function �m��jx�i� � � when thedistance between x and x�i� tends to in�nity� re�ecting a state of total ignorance�When d denotes the Euclidean distance� a rational choice for �q was shown in �� tobe�
�q�d � exp���qd� ��
�q being a positive parameter associated to class �q �As a result of considering each training pattern in turn� we obtain N BBAs that
can be combined using the Dempster�s rule of combination to form a resulting BBAm synthesizing one�s �nal belief regarding the class of x�
m � m��jx���� � � ��m��jx�N� ��
�In this paper� we assume for simplicity the class of each training vector to be known with certainty�The more general situation in which the training set is only imperfectly labeled has been introducedin ���� However� the problem of optimizing the parameters in the general case is not completely solvedyet �see Section ���
�
Since those training patterns situated far from x actually provide very little informa�tion� it is su�cient to consider the k nearest neighbors of x in this sum� An alternativede�nition of m is therefore�
m � m��jx�i��� � � ��m��jx�ik� ��
where Ik � fi�� � � � � ikg contains the indices of the k nearest neighbors of x in T �Adopting this latter de�nition� m can be shown �� to have the following expression�
m�f�qg ��
K
���� Y
i�Ik�q
��� ��q�d�i�
�AY
r ��q
Yi�Ik�r
��� ��r�d�i� ��
�q � f�� � � � �Mg
m�� ��
K
MYr��
Yi�Ik�r
��� ��r�d�i� ��
where Ik�q is the subset of Ik corresponding to those neighbors of x belonging to class�q and K is a normalizing factor� Hence� the focal elements of m are singletons andthe whole frame �� Consequently� the credibility and the plausibility of each class �qare respectively equal to�
bel�f�qg � m�f�qg ��
pl�f�qg � m�f�qg �m�� ���
The pignistic probability distribution as de�ned by Smets ��� is given by�
BetP�f�qg �X
fA��j�q�Ag
m�A
jAj� m�f�qg �
m��
M���
for q � �� � � � �M � Let us now assume that� based on this evidential corpus� a decisionhas to be made regarding the assignment of x to a class� and let us denote by �qthe action of assigning x to class �q� Let us further assume that the loss incurred incase of a wrong classi�cation is equal to �� while the loss corresponding to a correctclassi�cation is equal to �� Then� the lower and the upper expected losses �� associatedto action �q are respectively equal to�
R���qjx � �� pl�f�qg ���
R���qjx � �� bel�f�qg ���
The expected loss relative to the pignistic distribution is�
Rbet��qjx � �� BetP�f�qg ���
Given the particular form of m� the three strategies consisting in minimizing R�� R�
and Rbet lead to the same decision in that case� the pattern is assigned to the classwith maximum belief assignment� Other decision strategies including the possibilityof pattern rejection as well as the existence of unknown classes are studied in ��� � �
�
� Parameter optimization
��� The approach
In the above description of the evidence�theoretic k�NN rule� we left open the questionof the choice of parameters � and � � ���� � � � � �q
t appearing in Equations � and ��Whereas the value of � proves in practice not to be too critical� the tuning of the otherparameters was found experimentally to have signi�cant in�uence on classi�cationaccuracy� In �� � it was proposed to set � � ���� and �q to the inverse of the meandistance between training patterns belonging to class �q� Although this heuristicyields good results on average� the e�ciency of the classi�cation procedure can beimproved if these parameters are determined as the values optimizing a performancecriterion� Such a criterion can be de�ned as follows�
Let us consider a training pattern x��� belonging to class �q� The class mem�
bership of x can be encoded as a vector t��� � �t���� � � � � � t
���M t of M binary indicator
variables t���j de�ned by t
���j � � if j � q and t
���j � � otherwise� By consider�
ing the k nearest neighbors of x��� in the training set� one obtains a �leave�one�out� BBA m��� characterizing one�s belief concerning the class of x��� if this patternwas to be classi�ed using other training patterns� Based on m���� an output vec�tor P ��� � �BetP����f��g� � � � �BetP
����f�Mgt of pignistic probabilities can be com�puted� BetP��� being the pignistic probability distribution associated to m���� Ideally�vector P ��� should as �close� as possible to vector t���� closeness being de�ned� forexample� according to the squared error E�x����
E�x��� � �P ��� � t���t�P ��� � t��� �MXq��
�P ���q � t���q � ���
The mean squared error over the whole training set T of size N is �nally equal to�
E ��
N
NX���
E�x��� ���
Function E can be used as a cost function for tuning the parameter vector �� Theanalytical expression for the gradient of E�x��� with respect to � can be calculated�allowing the parameters �q to be determined iteratively by a gradient search procedure�see Appendix A� Alternatively� the minimum of function E can be approximated inone step for large N using the approach described in the sequel�
��� One�step procedure
For an arbitrary training pattern x��� and �xed parameters� vector P ��� can be re�garded as a function of two vectors�
�� a vector d� � �d�i���� � � � � d�ik��t of squared distances between x��� and its k
nearest neighbors� and
�
�� a vector � containing the class labels of these neighbors�
For small k and largeN � d� can be assumed to close to zero�� allowing each component
P���q to be approximated by Taylor series expansion around � up to the �rst order�
P ���q �d� �� P ���
q �� � rd
�P ���tq
���d
���
d� ���
The �rst term in this expression can be readily obtained from Equations � and � bysetting d�i� to � for all i� which leads to�
P ���q �� �
��� �k
K
���� ��kq � � �
�
M
����
where kq is the number of neighbors of x��� in class �q and
K ���� �kPM
q����� ��kq � k � ����
The computation of the �rst order term�
rd�P ���t
q
���d
���
d� �kX
i��
�P���q
�d�i����d�i�� ���
is more involved �see appendix B� This term can be shown to be of the form A�����where A��� is a square matrix of size M � As a consequence� both E�x��� and E can beapproximated by quadratic forms of �� which allows the minimum to be approacheddirectly by solving a system of linear equations�
Figures � and � show the quality of this approximation in the case of two Gaussianclasses with mean vectors �� � �� �t and �� � ��� �t� respectively� and covariancematrices �� � �� � I � The data set contained ��� samples of each class� Displayedare the mean squared error as a function of ���� �� �Figure � and its quadraticapproximation �Figure �� for k � �� The minima of the two functions di�er by lessthan ������ �� which proves the relevance of the approximation in that case� Notethat the quality of the approximation depends on both k and N �
� Numerical experiments
The performances of the above methods were compared to those of the voting k�NNrule with randomly resolved ties� the distance�weighted k�NN rule �� � the fuzzy k�NN rule proposed by Keller �� � and the evidence�theoretic rule without parameteroptimization �� � Experiments were carried out on a set of standard arti�cial and
�This assumption is justi�ed by the following result �� Regarding the training set as a sampledrawn from some probability distribution� the k�th nearest neighbor of x��� converges to x��� withprobability one as the sample size increases with k �xed�
�
real�world benchmark classi�cation tasks� The main characteristics of the used datasets are summarized in Table ��
Data sets B� and B� were generated using a method proposed in �� � The dataconsists in three Gaussian classes in �� dimensions� which diagonal covariance matricesD�� D� and D�� The i�th diagonal element Dqi of Dq is de�ned as a function of twoparameters a and b�
D�i�a� b � a� �b� ai� �
n� �
D�i�a� b � a� �b� an � i
n� �
D�i�a� b � a� �b� amin
�i�n � i
n��
�
where n � �� is the input dimension� The mean vectors and covariance matrices �withn � �� for the three classes were�
�� � ��� � � � � � �� � ��� � � � � � �� � ������ � � � � ����D���� �� D����� � D���� ��
The ionosphere data set �Ion was collected by a radar system and consists of phasedarray of �� high�frequency antennas with a total transmitted power of the order of��� kilowatts ��� � The targets were free electrons in the ionosphere� �Good� radarreturns are those showing evidence of some type of structure in the ionosphere� �Bad�returns are those that do not�
The vehicle data set �Veh was collected from silhouettes by the HIPS �HierarchicalImage Processing System extension BINATTS� Four model vehicles were used for theexperiment� bus� Chevrolet van� Saab ���� and Opel Manta ���� The data was usedto distinguish �D objects within a �D silhouette of the objects ��� �
The sonar data were used by Gorman and Sejnowski in a study of the classi�cationof sonar signals using a neural network �� � The task is to discriminate between sonarsignals bounced o� a metal cylinder and those bounced o� a roughly cylindrical rock�
Test error rates are represented as a function of k in Figures � to ��� The results forsynthetic data are averages over � independent training sets� Test error rates obtainedfor the values of k yielding the best classi�cation of training vectors are presented inTable ��
As can be seen from these results� the evidence�theoretic rule with optimized �
presented in this paper always performed as well or better than the four other rulestested� and signi�cantly improved at the �� � con�dence level over the evidence�theoretic rule with �xed � on data sets B� and B� when considering the best resultsobtained for � � k � ��� However� the most distinctive feature of this rule seems tobe its robustness with respect to the number k of neighbors taken in consideration�Figures � to ��� By optimizing �� the method learns to discard those neighbors whosedistance to the pattern under consideration is too high� Practically� this property isof great importance since it relieves the designer of the system from the burden of
�
searching for the optimal value of k� When the number of training patterns is large�then the amount of computation may be further reduced by adopting the approximateone�step procedure for optimizing � which gives reasonably good results for small k�Figures �� �� �� � and ��� However� the use of the exact procedure should be preferedfor small and medium�sized training sets�
� Concluding remarks
A technique for optimizing the parameters in the evidence�theoretic k�NN rule hasbeen presented� The classi�cation rule obtained by this method has proved superiorto the voting� distance�weighted and fuzzy rules on a number of benchmark problems�A remarkable property achieved with this approach is the relative insensitivity of theresults to the choice of k�
The method can be generalized in several ways� First of all� one can assume amore general metric than the Euclidean one considered so far� and apply the principlesdescribed in this paper to search for the optimal metric ��� � For instance� let �q bea positive de�nite diagonal matrix with diagonal elements �q��� � � � � �q�n� The distancebetween an input vector x and a learning vector x�i� belonging to class �q can bede�ned as�
d�x�x�i� � �x� x�i�t�q�x� x�i�
�nX
j��
�q�j�xj � x�i�j �
The parameters �q�j for � � q �M and � � j � n can then be optimized using exactlythe same approach as described in this paper� which may in some cases result in furtherimprovement of classi�cation results� A more general form could even be assumed for�q� with however the risk of a dramatic increase in the number of parameters for largeinput dimensions�
More fundamentally� the method can also be extended to handle the more generalsituation in which the class membership of training patterns is itself a�ected by un�certainty� For example� let us assume that the class of each training pattern x�i� isonly known to lie in a subset A�i� of � �such a situation may typically arise� e�g�� inmedical diagnosis problems in which some records in a database are related to patientsfor which only a partial or uncertain diagnosis is available� A natural extension ofEquations ��� is then�
m�A�i�jx�i� � ���d�i�
m��jx�i� � �� ���d�i�
m�Bjx�i� � �� �B � �� n f�� A�i�g
with ��d � exp���d�� � being a positive parameter �note that we cannot a de�ne aseparate parameter for each class in this case� since the class of x�i� is only partiallyknown� The BBAs de�ned in that way correspond to simple belief functions and can
�
be combined in linear time with respect to the number of classes� For optimizing ��the error criterion de�ned in Equation �� has to be generalized in some way� Withthe same notations as in Section ���� a possible expression for the error concerningpattern x��� is�
E�x��� ��BetP����A���� �
���
which re�ects the fact that the pignistic probability of x��� belonging to A���� given theother training patterns� should be as high as possible� The value of � minimizing themean error may then be determined using an iterative search procedure� Experimentswith this approach are currently under way and will be reported in future publications�
�
A Computation of the derivatives of E w�r�t� �q
Let x��� be a training pattern and m��� the BBA obtained by classifying x��� usingits k nearest neighbors in the training set� Function m��� is computed according toEquations � and ��
m����f�qg ��
K
�BB��� Y
i�I���k�q
��� ��q�d���i�
�CCAY
r ��q
Yi�I
���k�r
��� ��r�d���i� ���
�q � f�� � � � �Mg
m����� ��
K
MYr��
Yi�I
���k�r
��� ��r�d���i� ���
where I���k�r denotes the set of indices of the k nearest neighbors of pattern x��� in class
�r� d���i� is the distance between x��� and x�i�� and K is a normalizing factor� In thefollowing� we shall assume that�
�q�d���i� � exp���qd
���i�� ���
which will simply be denoted by ����i�q � To simplify the calculations� we further in�
troduce the unnormalized orthogonal sum ��� m���� de�ned as m�����A � Km����Afor all A � �� We also denote as m���i� the unnormalized orthogonal sum of the
m��jx�j� for all j � I���k � j �� i� that is m���� � m���i� m��jx�i�� where denotes the
unnormalized orthogonal sum operator� More precisely� we have�
m�����f�qg � m����f�qgjx�i�� m���i��f�qg � m���i��� � ���
m�����jx�i� m���i��f�qg �q � f�� � � � �Mg
m������ � m���i���m��jx�i� ���
The error for pattern x��� is�
E�x��� �MXq��
�P ���q � t���q � ���
where t��� is the class membership vector for pattern x��� and P���q is the pignistic
probability of class �q computed from m��� as P���q � m����f�qg �
m������M
�
The derivative of E�x��� with respect to each parameter �q can be computed as�
�E�x���
��q�Xi�I
���k�q
�E�x���
�����i�q
�����i�q
��q���
�
with
�E�x���
�����i�q
�MXr��
�E�x���
�P���r
�P���r
�����i�q
���
�MXr��
��P ���r � t���r
��m����f�rg
�����i�q
��
M
�m�����
�����i�q
���
and��
���i�q
��q� �d���i������i�q ���
The derivatives in Equation �� can be computed as�
�m����f�rg
�����i�q
��
�����i�q
m�����f�rg
K
����
��
K�
�K�m�����f�rg
�����i�q
�m�����f�rg�K
�����i�q
���
�m�����
�����i�q
��
K�
�K�m������
�����i�q
�m�������K
�����i�q
���
�K
�����i�q
�MXr��
�m�����f�rg
�����i�q
��m������
�����i�q
���
Finally��m�����f�rg
�����i�q
� � m���i��f�rg � m���i����rq � � m���i��� ���
where is the Kronecker symbol� and
�m������
�����i�q
� � m���i��� ���
which completes the calculation of the gradient of E�x��� w�r�t� �q�To account for the constraint �q �� we introduce new parameters q �q �
�� � � � �M such that��q � �q
� ���
and we compute �E�x������q
as�
�E�x���
�q��E�x���
��q
��q�q
� �q�E�x���
��q���
��
B Linearization
We consider the expansion around d� � � of P���q by a Taylor series up to the �rst
order�P ���q �d� �� P ���
q �� � �rd
�P ���q ��td� ���
where P���q �� is given by Equations ��� In the following� we shall compute the �rst
order term in the above equation� and deduce from that result a method for deter�mining an approximation to the optimal parameter vector� To simplify the notation�the superscript �� will be omitted from the following calculations�
As a result of the de�nition of the pignistic probability� we have�
�Pq�d�i��
��m�f�qg
�d�i���
�
M
�m��
�d�i�����
The derivatives ofm�f�qg andm�� can be more conveniently expressed as a functionof the unnormalized BBA m��
�m�f�qg
�d�i���
�
K�
�K�m��f�qg
�d�i���m��f�qg
�K
�d�i��
����
�m��
�d�i���
�
K�
�K�m���
�d�i���m���
�K
�d�i��
����
To compute �m��f�qg��d�i��
and �m�����d�i��
we need to distinguish two cases�Case �� i � Ik�q � We then have�
�m��f�qg
�d�i��� ���q exp���qd
�i��Y
j�Ik�q�j ��i
��� � exp���qd�j��
�Yr ��q
Yj�Ik�r
��� � exp���qd�j�� ���
�m���
�d�i��� ��q exp���qd
�i��MYr��
Yj�Ik�r �j ��i
��� � exp���rd�j�� ���
Setting all distances to � in the above equations� we have�
�m��f�qg
�d�i��
����d
���
� ���q��� �k�� ���
�m���
�d�i��
����d
���
� ��q��� �k�� ���
Case �� i � Ik�l� l �� q� We have�
�m��f�qg
�d�i��� ��l exp���ld
�i�����Y
j�Ik�q
��� � exp���qd�j��
��
�Yr ��q
Yj�Ik�r �j ��i
��� � exp���rd�j�� ���
�m���
�d�i��� ��l exp���ld
�i��MYr��
Yj�Ik�r �j ��i
��� � exp���rd�j�� ���
Setting the distances to zero in the above equations�
�m��f�qg
�d�i��
����d
���
� ��l��� ��� �kq ��� �k�kq�� ���
�m���
�d�i��
����d
���
� ��l��� �k�� ���
where kq � jIk�qj�The derivatives of K are simply obtained as follows�
�K
�d�i���
MXq��
�m��f�qg
�d�i����m���
�d�i�����
Hence��K
�d�i��
����d���
� ��q
MXr���r ��q
��� ��� �kr ��� �k�kr�� ���
It follows from the preceding calculations that� for i � Ik�r� the derivatives ofm��f�qg� m
��� and K for d� � � are proportional to �r� Since m��f�qg� m
��� andK do not themselves depend on � for d� � �� the derivative of Pq is also proportionalto �r� Hence� we have�
Xi�Ik�r
�Pq�d�i��
����d
���
d�i�� � Aq�r�r ���
for all r � f�� � � � �Mg� Aq�r being some constant not depending on �� Consequently�we can write�
Pq � Pq�� �MXr��
Aq�r�r ���
and� expressing this result in matrix form�
P �� P �� �A� ���
with A � �Ai�j is a square matrix of size M �The above calculations have been performed for an arbitrary training pattern x�
Reintroducing the pattern index �� we have�
P ��� �� P ����� �A���� ���
��
Introducing these terms into the mean squared error� we have�
E ��
N
NX���
�P ��� � t���t�P ��� � t���
��
N
NX���
�P ������ t��� �A����t�P ������ t��� �A���� ���
��
N
NX���
�P ������ t���t�P ������ t��� � ��tA���t�P ������ t��� � ���
�tA���tA����
The gradient E with respect to � is therefore given by�
r�E ��
N
�NX���
A���t�P ������ t��� �NX���
A���tA����
���
Minimizing E under the constraint � � is a nonnegative least squares problem thatmay be solved e�ciently using� for instance� the algorithm described in ���� page ��� �
��
References
�� T� M� Cover and P� E� Hart� Nearest neighbor pattern classi�cation� IEEETransactions on Information Theory� IT������������ �����
�� T� Den!ux� A k�nearest neighbor classi�cation rule based on Dempster�Shafertheory� IEEE Transactions on Systems� Man and Cybernetics� �������������������
�� T� Den!ux� Analysis of evidence�theoretic decision rules for pattern classi�cation�Pattern Recognition� ��������������� �����
�� T� Den!ux� Application du mod"ele des croyances transf#erables en reconnaissancede formes� Traitement du Signal �In press�� �����
�� T� Den!ux and G� Govaert� Combined supervised and unsupervised learningfor system diagnosis using Dempster�Shafer theory� In P� Borne et al�� editor�CESA��� IMACS Multiconference� Symposium on Control� Optimization and Su�pervision� volume �� pages �������� Lille� July �����
�� S� A� Dudani� The distance�weighted k�nearest�neighbor rule� IEEE Transactionson Systems� Man and Cybernetics� SMC����������� �����
�� J� H� Friedman� Regularized discriminant analysis� Journal of the AmericanStatistical Association� ����������� �����
�� R� P� Gorman and T� J� Sejnowski� Analysis of hidden units in a layered networktrained to classify sonar targets� Neural Networks� �������� �����
�� J� M� Keller� M� R� Gray� and J� A� Givens� A fuzzy k�NN neighbor algorithm�IEEE Transactions on Systems� Man and Cybernetics� SMC�������������� �����
��� C� L� Lawson and R� J� Hanson� Solving Least Squares Problems� Prentice�hall������
��� P� M� Murphy and D� W� Aha� UCI Reposition of machine learning databases�Machine�readable data repository� University of California� Departement of In�formation and Computer Science�� Irvine� CA� �����
��� G� Shafer� A mathematical theory of evidence� Princeton University Press� Prince�ton� N�J�� �����
��� P� Smets� The combination of evidence in the Transferable Belief Model� IEEETransactions on Pattern Analysis and Machine Intelligence� ������������� �����
��� P� Smets and R� Kennes� The Transferable Belief Model� Articial Intelligence������������ �����
��
��� L� M� Zouhal� Contribution �a l�application de la th�eorie des fonctions de croy�ance en reconnaissance des formes� PhD thesis� Universit#e de Technologie deCompi"egne� �����
��
Biography
Lalla Meriem Zouhal
Lalla Meriem Zouhal received a M�S� in electronics from the Facult#e des SciencesHassan II� Casablanca� Morocco� in ����� a DEA �Dipl$ome d�Etudes Approfondiesin System Control from the Universit#e de Technologie de Compi"egne� France� in �����and a PhD from the same institution in ����� Her research interests concern patternclassi�cation� Dempster�Shafer theory and Fuzzy logic�
Thierry Den�ux
Thierry Den!ux graduated in ���� as an engineer from the Ecole Nationale des Pontset Chauss#ees in Paris� and earned a PhD from the same institution in ����� Heobtained the �Habilitation "a diriger des Recherches� from the Institut National Poly�technique de Lorraine in ����� From ���� to ����� he was employed by the Lyonnaisedes Eaux water company� where he was in charge of research projects concerning theapplication of neural networks to forecasting and diagnosis� Dr� Den!ux joined theUniversit#e de Technologie de Compi"egne as an assistant professor in ����� His researchinterests include arti�cial neural networks� statistical pattern recognition� uncertaintymodeling and data fusion�
��
List of Tables
� Main characteristics of data sets� number of classes �M� training setsize �N� test set size �Nt and input dimension �n� � � � � � � � � � � ��
� Test error rates obtained with the voting� distance�weighted� fuzzy andevidence�theoretic classi�cation rules for the best value of k �in brack�ets� with �� � con�dence intervals� ETF� evidence�theoretic classi�erwith �xed �% ETO� evidence�theoretic classi�er with optimized �� � � � ��
List of Figures
� Contour lines of the error function for di�erent values of �� using thegradient�descent method� The optimal value is � � ������ ����t� � � � ��
� Contour lines of the error function for di�erent values of �� using thelinearization method for pignistic probabilities vectors� The optimalvalue is � � ������ ���t� � � � � � � � � � � � � � � � � � � � � � � � � � � ��
� Test error rates on data sets B� as a function of k� for the ETF �x�and ETO k�NN rules� �� gradient method and �� � linearization method� ��
� Test error rates on data sets B� as a function of k� for the voting ���ETO��� fuzzy �� � and distance�weighted ��� k�NN rules� � � � � � � � ��
� Test error rates on data sets B� as a function of k� for the ETF �x�and ETO k�NN rules� �� gradient method and �� � linearization method� ��
� Test error rates on data sets B� as a function of k� for the voting ���ETO��� fuzzy �� � and distance�weighted ��� k�NN rules� � � � � � � � ��
� Test error rates on ionosphere data as a function of k� for the ETF �x�and ETO k�NN rules� �� gradient method and �� � linearization method� ��
� Test error rates on ionosphere data as a function of k� for the voting��� ETO��� fuzzy �� � and distance�weighted ��� k�NN rules� � � � � � ��
� Test error rates on vehicle data as a function of k� for the ETF �x� andETO k�NN rules� �� gradient method and �� � linearization method� ��
�� Test error rates on vehicle data as a function of k� for the voting ���ETO��� fuzzy �� � and distance�weighted ��� k�NN rules� � � � � � � � ��
�� Test error rates on sonar data as a function of k� for the ETF �x� andETO k�NN rules� �� gradient method and �� � linearization method� ��
�� Test error rates on sonar data as a function of k� for the voting ���ETO��� fuzzy �� � and distance�weighted ��� k�NN rules� � � � � � � � ��
��
Table �� Main characteristics of data sets� number of classes �M� training set size�N� test set size �Nt and input dimension �n�
data set M N Nt n
B� � �� ���� ��
B� � ��� ���� ��
Ion � ��� ��� ��
veh � ��� ��� ��
Son � ��� ��� ��
Table �� Test error rates obtained with the voting� distance�weighted� fuzzy andevidence�theoretic classi�cation rules for the best value of k �in brackets� with �� �con�dence intervals� ETF� evidence�theoretic classi�er with �xed �% ETO� evidence�theoretic classi�er with optimized ��
data set voting ETF ETO weighted fuzzy
B� ���� � ���� ���� � ���� ���� � ���� ���� � ���� ���� � ������ ��� ��� ��� ��
B� ���� � ���� ���� � ���� ���� � ���� ���� � ���� ����� ������� ��� ��� ��� ���
Ion ���� � ���� ���� � ���� ���� � ���� ���� � ���� ���� � ������ �� �� �� ��
Veh ���� � ���� ���� � ���� ���� � ���� ���� � ���� ���� � ������ �� �� �� ��
Son ���� � ���� ���� � ���� ���� � ���� ���� � ���� ���� � ������ �� �� �� ��
��
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0.01859
0.018601
0.018643
0.018705
0.018809
Figure �� Contour lines of the error function for di�erent values of �� using thegradient�descent method� The optimal value is � � ������ ����t�
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0.018585
0.018593
0.018619
0.018676
Figure �� Contour lines of the error function for di�erent values of �� using thelinearization method for pignistic probabilities vectors� The optimal value is � ������� ���t�
��
0 5 10 150.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
k
erro
r ra
te
Figure �� Test error rates on data sets B� as a function of k� for the ETF �x� andETO k�NN rules� �� gradient method and �� � linearization method�
0 5 10 15 20 25 30 35 400.3
0.35
0.4
0.45
0.5
0.55
0.6
k
erro
r ra
te
Figure �� Test error rates on data sets B� as a function of k� for the voting ��� ETO���fuzzy �� � and distance�weighted ��� k�NN rules�
��
0 5 10 150.24
0.26
0.28
0.3
0.32
0.34
0.36
k
erro
r ra
te
Figure �� Test error rates on data sets B� as a function of k� for the ETF �x� andETO k�NN rules� �� gradient method and �� � linearization method�
0 5 10 15 20 25 30 35 400.2
0.22
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0.4
k
erro
r ra
te
Figure �� Test error rates on data sets B� as a function of k� for the voting ��� ETO���fuzzy �� � and distance�weighted ��� k�NN rules�
��
0 5 10 150.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
k
erro
r ra
te
Figure �� Test error rates on ionosphere data as a function of k� for the ETF �x� andETO k�NN rules� �� gradient method and �� � linearization method�
0 5 10 15 20 25 30 35 400.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
k
erro
r ra
te
Figure �� Test error rates on ionosphere data as a function of k� for the voting ���ETO��� fuzzy �� � and distance�weighted ��� k�NN rules�
��
0 5 10 15
0.32
0.34
0.36
0.38
0.4
k
erro
r ra
te
Figure �� Test error rates on vehicle data as a function of k� for the ETF �x� andETO k�NN rules� �� gradient method and �� � linearization method�
0 5 10 15 20 25 30 35 400.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
k
erro
r ra
te
Figure ��� Test error rates on vehicle data as a function of k� for the voting ���ETO��� fuzzy �� � and distance�weighted ��� k�NN rules�
��
0 5 10 150.1
0.15
0.2
0.25
0.3
0.35
0.4
k
erro
r ra
te
Figure ��� Test error rates on sonar data as a function of k� for the ETF �x� andETO k�NN rules� �� gradient method and �� � linearization method�
0 5 10 15 20 25 30 35 400.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
k
erro
r ra
te
Figure ��� Test error rates on sonar data as a function of k� for the voting ��� ETO���fuzzy �� � and distance�weighted ��� k�NN rules�
��
Top Related