Research Article An Efficient Diagnosis System for Parkinson ...accuracy of. %. Astr om and Koker...

Research ArticleAn Efficient Diagnosis System for Parkinsonrsquos DiseaseUsing Kernel-Based Extreme Learning Machine with SubtractiveClustering Features Weighting Approach

Chao Ma12 Jihong Ouyang12 Hui-Ling Chen3 and Xue-Hua Zhao12

1 College of Computer Science and Technology Jilin University No 2699 QianJin Road Changchun 130012 China2 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin UniversityChangchun 130012 China

3 College of Physics and Electronic Information Wenzhou University Wenzhou 325035 China

Correspondence should be addressed to Jihong Ouyang lsw 1017aliyuncom

Received 21 June 2014 Accepted 26 October 2014 Published 18 November 2014

Academic Editor Dong Song

Copyright copy 2014 Chao Ma et alThis is an open access article distributed under the Creative CommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

A novel hybridmethod named SCFW-KELMwhich integrates effective subtractive clustering features weighting and a fast classifierkernel-based extreme learning machine (KELM) has been introduced for the diagnosis of PD In the proposed method SCFWis used as a data preprocessing tool which aims at decreasing the variance in features of the PD dataset in order to furtherimprove the diagnostic accuracy of the KELM classifier The impact of the type of kernel functions on the performance of KELMhas been investigated in detail The efficiency and effectiveness of the proposed method have been rigorously evaluated againstthe PD dataset in terms of classification accuracy sensitivity specificity area under the receiver operating characteristic (ROC)curve (AUC) f -measure and kappa statistics value Experimental results have demonstrated that the proposed SCFW-KELMsignificantly outperforms SVM-based KNN-based and ELM-based approaches and other methods in the literature and achievedhighest classification results reported so far via 10-fold cross validation scheme with the classification accuracy of 9949 thesensitivity of 100 the specificity of 9939 AUC of 9969 the f -measure value of 09964 and kappa value of 09867 Promisinglythe proposed method might serve as a new candidate of powerful methods for the diagnosis of PD with excellent performance

1 Introduction

Parkinsonrsquos disease (PD) is one degenerative disease of thenervous system which is characterized by a large groupof neurological conditions called motor system disordersbecause of the loss of dopamine-producing brain cells Themain symptoms of PD are given as follows (1) tremor ortrembling in hands arms legs jaw or head (2) rigidity orstiffness of the limbs and trunk (3) bradykinesia or slownessof movement (4) postural instability or impaired balance(httpwwwnindsnihgovresearchparkinsonswebindexhtm last accessed April 2012) At present PD has an impacton about 1 of the worldwide population over the age of50 however this proportion is on the increase as people livelonger [1] Till now PD has no medical treatment and somededication is only available for relieving the symptoms of

disease [2] It is so important that we gainmore of insight intothe problem and improve our methods to deal with PD Herewe focus on the study based on dysphonia which is knownas a group of vocal impairment symptoms it is reportedto be one of the most significant symptoms of PD [3] Theresearches have shown that about 90of peoplewith PDhavesuch vocal evidence The dysphonic indicators of PD makespeech measurements as an important part of diagnosis [4]Dysphonic measures have been proposed as a reliable tool todetect and monitor PD [5 6]

Previous studies on the PD problem based on machinelearning methods have been undertaken by variousresearchers Little et al [6] used support vector machine(SVM) classifier with Gaussian radical basis kernel functionto predict PD by means of feature selection method toreduce the feature space and best accuracy rate of 914 was

Hindawi Publishing CorporationComputational and Mathematical Methods in MedicineVolume 2014 Article ID 985789 14 pageshttpdxdoiorg1011552014985789

2 Computational and Mathematical Methods in Medicine

obtained by the proposed model Shahbaba and Neal [7]presented a nonlinear model based on Dirichlet mixturesfor the PD classification compared with multinomial logitmodels decision trees and SVM the classification accuracyof 877 was achieved by the proposed model Das [8] useda comparative study of neural networks (NN) DMneuralregression and decision trees for the diagnosis of PDthe experiment results had shown that the NN methodachieved the overall classification performance of 929Sakar and Kursun [9] used mutual information measure tocombine with SVM for the diagnosis of PD and achieved theclassification result of 9275 Psorakis et al [10] introducedsample selection strategies and model improvements formulticlass multikernel relevance vector machines andachieved the classification accuracy of 8947 in the PDdataset Guo et al [11] combined genetic programming andthe expectation maximization (EM) to diagnose PD in theordinary feature data and achieved the classification accuracyof 931 Luukka [12] proposed a new method which usedfuzzy entropy measures to combine with the similarityclassifier to predict PD and the mean classification of8503 was achieved Li et al [13] introduced a fuzzy-basednonlinear transformation approach together with SVM inthe PD dataset best classification accuracy of 9347 wasobtained Ozcift and Gulten [14] combined the correlationbased feature selection method with the rotation forestensemble classifier of 30 machine learning algorithms todistinguish PD the proposed model got best classificationaccuracy of 8713 Astrom and Koker [15] achieved highestclassification accuracy of 912 by using a parallel neuralnetwork model for PD diagnosis Spadoto et al [16] adoptedevolutionary based method together with the optimum-pathforest (OPF) classifier for PDdiagnosis and best classificationaccuracy of 8401 was obtained Polat [17] applied the fuzzy119862-means (FCM) clustering feature weighting (FCMFW)together with the 119896-nearest neighbor classifier for detectingPD the classification accuracy of 9793 was obtainedChen et al [18] proposed a model which used the principlecomponent analysis based feature extraction together withthe fuzzy 119896-nearest neighbor method to predict PD andachieved best classification accuracy of 9607 by theproposed model Daliri [19] presented a chi-square distancekernel-based SVM to discriminate the subjects with PDfrom the healthy control subjects using gait signals and theclassification result of 912was obtained Zuo et al [20] useda new diagnosis model based on particle swarm optimization(PSO) to strengthen the fuzzy 119896-nearest neighbor classifierfor the diagnosis of PD and the mean classification accuracyof 9747 was achieved

From theseworks it can be seen thatmost of the commonclassifiers frommachine learning community have been usedfor PD diagnosis For the nonlinear classification problemsthe data preprocessing methods such as feature weightingnormalization and feature transformation could increase theperformance of alone classifier algorithm So it is obviousthat the choice of an efficient feature preprocessing methodand an excellent classifier is of significant importance for thePD diagnosis problem Aiming at improving the efficiencyand effectiveness of the classification performance for the

diagnosis of PD in this paper an efficient features weight-ing method called subtractive clustering features weighting(SCFW) and a fast classification algorithm named kernel-based extreme learning machine (KELM) are examined TheSCFW method is used to map the features according to datadistributions in dataset and transform linearly nonseparabledataset to linearly separable dataset In this way the similardata within each feature are prone to getting together sothat the distinction between classes is increased to classifythe PD datasets correctly It is reported that SCFW methodcan help improve the discrimination abilities of classifiers inmany applications such as traffic accident analysis [21] andmedical datasets transformation [22] KELM is the improvedversion of ELM algorithm based on kernel function [23]The advantage of KELM is that only two parameters (thepenalty parameter 119862 and the kernel parameter 120574) need tobe adjusted unlike ELM which needs to specify the suitablevalues of weights and biases for improving the generalizationperformance [24] Furthermore KELMnot only trains as fastas that of ELM but also can achieve good generalization per-formanceThe objective of the proposedmethod is to explorethe performance of PD diagnosis using a two-stage hybridmodeling procedure via integrating SCFW with KELMFirstly the proposed method adopts SCFW to construct thediscriminative feature space through weighting features andthen the achieved weighted features serve as the input ofthe trained KELM classifier To evaluate the performanceof proposed hybrid method classification accuracy (ACC)sensitivity specificity AUC 119891-measure and kappa statisticvalue have been used Experimental results have shown thatthe proposed method achieves very promising results basedon proper kernel function by 10-fold cross validation (CV)

The main contributions of this paper are summarized asfollows

(1) It is the first time that we have proposed to integrateSCFW approach with KELM classifier to detect PD inan efficient and effective way

(2) In the proposed system SCFW method is employedas data preprocessing tool to strengthen the discrim-ination between classes for further improving thedistinguishing performance of KELM classifier

(3) Compared with the existing methods in previousstudies the proposed diagnostic system has achievedexcellent classification results

The rest of the paper is organized as follows Section 2offers brief background knowledge on SCFW and KELMThe detailed implementations of the diagnosis system arepresented in Section 3 In the next section the detailedexperiment design is described and Section 5 gives theexperiment results and discussions of the proposed methodFinally conclusions and recommendations for future workare summarized in Section 6

Computational and Mathematical Methods in Medicine 3

BeginLoad the PD dataset represent the data as a matrix119872(119898 119899) withm samples and 119899 featuresInitialize the corresponding valuesCalculate the cluster centers using subtractive clustering methodCalculate the mean values of each feature in119872(119898 119899)

For each dataFor each feature in dataset

ratios (119894 119895) =mean(119904119895)cluster 119888119890119899119905119890119903119895End Forweighted features (119894 119895) =119872(119898 119899) lowast ratios (119894 119895)

End ForEnd

Algorithm 1 Pseudocode for weighting features based on subtractive clustering method

2 The Theoretical Background ofthe Related Methods

21 Subtractive Clustering Features Weighting (SCFW) Sub-tractive clustering is the improved version of mountainclustering algorithm The problem of mountain clustering isthat its calculation grows exponentially with the dimensionof the problem Subtractive clustering has solved this problemusing data points as the candidates for cluster centers insteadof grid points as in mountain clustering so the calculationcost is proportional to the problem size instead of the problemdimension [25] The subtractive clustering algorithm can bebriefly summarized as follows

Step 1 Consider a collection of 119899 data points 1199091 1199092 119909119899in119872-dimensional space Since each data point is a candidatefor cluster center the density measure at data point 119909119894 isdefined as

119863119894 =

119899

sum119895=1

exp(minus10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817

2

(1199031198862)2) (1)

where 119903119886 is a positive constant defining a neighborhoodradius it is used to determine the number of cluster centersSo a data point will have a high density value if it hasmany neighboring data points The data points outsidethe neighborhood radius contribute slightly to the densitymeasure Here 119903119886 is set to 05

Step 2 After the density measure of each data point has beencalculated the data point with the highest density measure isselected as the first cluster center Let1198831198881 be the point selectedand 1198631198881 the density measure Next the density measure foreach data point 119909119894 is revised as follows

119863119894 = 119863119894 minus 1198631198881 exp(minus10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817

2

(1199031198872)2) (2)

where 119903119887 is a positive constant and 119903119887 = 120578 sdot 119903119886 120578 is a constantgreater than 1 to avoid cluster centers being in too closeproximity In this paper 119903119887 is set to 08

Load the PD dataset

Use subtractive clusteringalgorithm to calculate the

cluster centers of each feature

Calculate the mean valuesof each feature

Calculate the ratios of meansof features to their centers

Multiply the ratios withdata of weighting features

Obtain the weighted features

Figure 1 The flowchart of SCFW algorithm

Step 3 After the density calculation for each data point isrevised the next cluster center 1198831198882 is selected and all thedensity calculations for data point are revised again Theprocess is repeated until a sufficient number of cluster centersare generated

For SCFW method firstly the cluster centers of eachfeature are calculated by using subtractive clustering Aftercalculating the centers of features the ratios of means offeatures to their cluster centers are calculated and theseratios are multiplied with the data of each feature [21] Thepseudocode of SCFW method is given in Algorithm 1 andthe flowchart of weighting process is shown in Figure 1

22 Kernel-Based Extreme Learning Machine (KELM) ELMis an algorithm originally developed for training singlehidden layer feed-forward neural networks (SLFNs) [26]Theessence of ELM is that parameters of hidden neurons inneural network are randomly created instead of being tuned


x1

x2

xn

N1

N2

NL

1205731

1205732

120573L

Onf(x)

Figure 2 The structure of ELM

and then fixed the nonlinearities of the network withoutiteration Figure 2 shows the structure of ELM

For given 119873 samples (x y) having 119871 hidden neuronsand activation function ℎ(119909) the output function of ELM isdefined as follows

119891 (119909) =

119871

sum119894=1

120573119894ℎ119894 (119909) = ℎ (119909) 120573 (3)

where 120573 = [1205731 1205732 120573119871] is the output weight con-necting hidden nodes to output nodes H = ℎ119894119895 (119894 =

1 119873 and 119895 = 1 119871) is the hidden layer output matrixof neural network ℎ(119909) actually maps the data from the d-dimensional input space to the L-dimensional hidden layerfeature spaceH and thus ℎ(119909) is indeed a feature mapping

The determination of the output weights is calculated bythe least square method

1205731015840= H+T (4)

where H+ is the Moore-Penrose generalized inverse [26] ofthe hidden layer output matrixH

To improve the generalization capabilities of ELM incomparisonwith the least square solution-basedELMHuanget al [23] proposed kernel-based method for the design ofELMThey suggested adding a positive value 1119862 (where119862 isa user-defined parameter) for calculating the output weightssuch that

120573 = 119867119879(119868

119862+ 119867119867

119879)minus1

119879 (5)

Therefore the output function is expressed as follows

119891 (119909) = ℎ120573 = ℎ (119909)119867119879(119868

119862+ 119867119867

119879)minus1

119879 (6)

When the hidden feature mapping function ℎ(119909) isunknown a kernel matrix for ELM is used according to thefollowing equation

ΩELM = 119867119867119879 ΩELM119894119895 = ℎ (119909119894) sdot ℎ (119909119895) = 119870 (119909119894 119909119895) (7)

where 119870(119909119894 119909119895) is a kernel function Many kernel functionssuch as linear polynomial and radial basis function canbe used in kernel-based ELM Now the output function ofKELM classifier can be expressed as

119891 (119909) =[[

[

119870 (119909 1199091)

119870 (119909 119909119873)

]]

]

119879

(119868

119862+ ΩELM)

minus1

119879 (8)

3 The Proposed SCFW-KELMDiagnosis System

This work proposes a novel hybrid method for PD diagnosisThe proposed model is comprised of two stages as shown inFigure 3 In the first stage SCFW algorithm is firstly appliedto preprocess data in the PD dataset The purpose of thismethod is tomap the features according to their distributionsin dataset and to transform from linearly nonseparable spaceto linearly separable one With this method similar datain the same feature are gathered which will substantiallyhelp improve the discrimination ability of classifiers In thenext stage KELM is evaluated on the weighted feature spacewith different types of activation functions to perform theclassification Finally the best parameters and the suitableactivation function are obtained based on the performanceanalysis The detailed pseudocode of the hybrid method isgiven in Algorithm 2

4 Experimental Design

41 Data Description In this section we have performedthe experiments in the PD dataset taken from University ofCalifornia Irvine (UCI) machine learning repository (httparchiveicsuciedumldatasetsParkinson last accessed Jan-uary 2013) It was created by Max Little of the University ofOxford in collaboration with the National Centre for Voiceand Speech Denver Colorado who recorded the speechsignals The purpose of PD dataset is to discriminate healthypeople from those with PD given the results of variousmedical tests carried out on a patientThe PD dataset consistsof voice measurements from 31 people of which 23 werediagnosed with PD There are 195 instances comprising 48healthy and 147 PD cases in the dataset The time sincediagnoses ranged from 0 to 28 years and the ages of thesubjects ranged from 46 to 85 years (mean 658) Each subjectprovides an average of six phonations of the vowel (yielding195 samples in total) each 36 seconds in length [6] Notethat there are no missing values in the dataset and the wholefeatures are real value The whole 22 features along withdescription are listed in Table 1

42 Experimental Setup The proposed SCFW-KELM classi-fication model was carried out on the platform of MATLAB70 The SCFW algorithm was implemented from scratchFor KELM and ELM the implementation available fromhttpwww3ntuedusghomeegbhuang was used

For SVM LIBSVM implementation was used which wasoriginally developed by Chang and Lin [27] The empiricalexperiment was conducted on Intel Dual-Core TM (20GHzCPU) with 2GB of RAM

In order to guarantee the valid results 119896-foldCVwas usedto evaluate the classification results [28] Each time nine often subsets were put together to form a training set and theother subset was used as the test set Then the average resultacross all 10 trials was calculated Thanks to this method allthe test sets were independent and the reliability of the results


Parkinsonrsquosdisease dataset

Data preprocessing stage

Classification stage

Weighting featuresusing subtractive

clustering method

Obtain the weighteddataset

Divide the weighted datasetspace using 10-fold cross

validation

middot middot middotTraining set Training setTest set Test set

Train KELM classifier withgrid-search technique on the

nine-fold training sets

Initial parameter pair (C 120574)

With the best combination (C 120574)

Predict the labels on the remainingone testing set using the trained

KELM model

No

Yes

K = 10

Average the prediction resultson the ten independent test sets

Obtain the optimal model

Figure 3 The overall procedure of the proposed hybrid diagnosis system

BeginWeight features using subtractive clustering algorithm

For 119894 = 1 k lowastPerformance estimation by using 119896-fold CV where 119896 = 10lowastTraining set = k-1 subsetsTest set = remaining subsetTrain KELM classifier in the weighted training data feature space store the best parameter combinationTest the trained KELMmodel on the test set using the achieved best parameter combination

End ForReturn the average classification results of KELM over 119894th test set

End

Algorithm 2 Pseudocode for the proposed model

could be improved Because of the arbitrariness of partitionof the dataset the predicted results of model at each iterationwere not necessarily the same To evaluate accurately theperformance of the PD dataset the experiment was repeated10 times and then the results were averaged

43Measure for Performance Evaluation In order to evaluatethe prediction performance of SCFW-KELMmodel we usedsix performance metrics ACC sensitivity specificity AUC119891-measure and kappa statistic value to test the performanceof the proposed model About the mentioned performance


Table 1 The details of the whole 22 features of the PD dataset

Label Feature Description

F1 MDVP Fo (Hz) Average vocalfundamental frequency

F2 MDVP Fhi (Hz) Maximum vocalfundamental frequency

F3 MDVP Flo (Hz) Minimum vocalfundamental frequency

F4 MDVP Jitter ()Several measures ofvariation infundamental frequency

F5 MDVP Jitter(Abs)

F6 MDVP RAPF7 MDVP PPQF8 Jitter PPQ

F9 MDVP Shimmer Several measures ofvariation in amplitude

F10 MDVP Shimmer(dB)

F11 Shimmer APQ3F12 Shimmer APQ5F13 MDVP APQF14 Shimmer DDA

F15 NHR

Two measures of ratioof noise to tonalcomponents in thevoice

F16 HNR

F17 RPDETwo nonlineardynamical complexitymeasures

F18 D2

F19 DFA Signal fractal scalingexponent

F20 Spread1

Three nonlinearmeasures offundamental frequencyvariation

F21 Spread2F22 PPE

evaluation formulations are defined as follows according tothe confusion matrix which is shown in Table 2

ACC = TP + TNTP + FP + FN + TN

times 100

Sensitivity = TPTP + FN

times 100

Specificity = TNFP + TN

times 100

Precision = TPTP + FP

Table 2 The confusion matrix

Predicted patientswith PD

Predicted healthypersons

Actual patientswith PD True positive (TP) False negative (FN)

Actual healthypersons False positive (FP) True negative (TN)

Recall = TPTP + FN

119891-measure = 2 times Precision times RecallPrecision + Recall

(9)

In the confusion matrix TP is the number of true pos-itives which represents that some cases with PD class arecorrectly classified as PD FN is the number of false negativeswhich represents that some cases with the PD class areclassified as healthy TN is the number of true negativeswhich represents that some cases with the healthy classare correctly classified as healthy and FP is the number offalse positives which represents that some cases with thehealthy class are classified as PD ACC is a widely usedmetric to determine class discrimination ability of classifiersThe receiver operating characteristic (ROC) curve is usuallyplotted using true positives rate versus false positives rateas the discrimination threshold of classification algorithmis varied The area under ROC curve (AUC) is widely usedin classification studies with relevant acceptance and it isa good summary of the performance of the classifier [29]Also 119891-measure is a measure of a testrsquos accuracy which isusually used as performance evaluation metric to assess theperformance of binary classifier based on the harmonicmeanfor the classifierrsquos precision and recall Kappa error (KE)or Cohenrsquos kappa statistics (KS) is adopted to compare theperformances of different classifiers KS is a good measureto inspect classifications that may be due to chance As KSvalue calculated for classifiers closer to 1 the performance ofclassifier is assumed to be more realistic rather than being bychance Thus KS value is a recommended metric to considerfor evaluation in the performance analysis of classifiers and itis calculated with [30]

KS = 119875 (119860) minus 119875 (119864)1 minus 119875 (119864)

(10)

where 119875(119860) means total agreement probability and 119875(119864)means agreement probability due to chance

5 Experimental Results and Discussions

Experiment 1 (classification in the PD dataset) In this exper-iment we firstly evaluated KELM in the original featurespace without SCFW It is known that different types ofkernel activation functions have great influence on theperformance of KELM Therefore we presented the resultsfrom our investigation on the influence of different types of


Table 3 Results of KELMwith different types of kernel functions inthe original PD dataset without SCFW

Kernel type Performancemetrics Mean SD Max Min

RBF kernel

ACC () 9589 466 100 8974Sensitivity () 9635 519 100 8889Specificity () 9572 593 100 8800

AUC () 9604 406 100 9043119891-measure 09724Kappa 08925

Wav kernel


AUC () 9319 456 100 8810f -measure 09622Kappa 08425

Lin kernel


AUC () 8170 1222 9545 6889119891-measure 09316kappa 06333

Poly kernel



kernel functions and assigned initial values for them Wetried to perform four types of kernel functions includingradial basis function (RBF kernel) wavelet kernel function(Wav kernel) linear kernel function (Lin kernel) and poly-nomial kernel function (Poly kernel) Table 3 summarizedthe detailed results of classification performance in the PDdataset in terms of ACC sensitivity specificity AUC 119891-measure and KS value and these results were achieved via10-fold CV scheme and represented in the form of averageaccuracy (Mean) standard deviation (SD)maximal accuracy(Max) and minimal accuracy (Min) From this table it canbe seen that the classification performance of KELM withvarious kernel functions is apparently differential The bestkernel function of KELM classifier in discriminating the PDdataset was RBF kernel function We can see that KELMwith RBF kernel outperforms that with the other three kernelfunctions with a mean accuracy of 9589 9635 9572and 9604 in terms of ACC sensitivity specificity and AUCand has also got 119891-measure value of 09724 and KS value of08925 KELM with wavelet kernel has obtained the averageresults of 9436 9124 9525 and 9319 in terms ofACC sensitivity specificity and AUC and got 119891-measurevalue of 09622 and KS value of 08425 lower than those ofKELM with RBF kernel The worse results of classification

Table 4 The cluster centers of the features of PD dataset usingSCFWmethod

Number offeature

Centers of thefeatures using SCFW

(normal case)

Centers of thefeatures using

SCFW (PD case)F1 154229 181938F2 197105 223637F3 116325 145207F4 0006 0006F5 0 0F6 0003 0003F7 0003 0003F8 001 001F9 003 003F10 0282 0276F11 0016 0015F12 0018 0018F13 0024 0013F14 0047 0045F15 0025 0028F16 21886 24678F17 0499 0443F18 0718 0696F19 minus5684 minus6759F20 0227 0161F21 2382 2155F22 0207 0123

performance obtained by KELMwith polynomial kernel andKELM with linear kernel were successively given Notingtraining KELM with kernel functions instead of sigmoidadditive function of ELM the number of hidden neurons hasno influence on the performance of KELM model so it doesnot need to be considered

To investigate whether SCFW method can improve theperformance of KELM we further conducted the modelin the PD dataset in the weighted feature space by SCFWThe proposed system consisted of two stages Firstly SCFWapproach was used to weight the features of PD datasetBy using SCFW method the weighted feature space wasconstructed Table 4 listed the cluster centers of the featuresin the PD dataset using SCFW method Figure 4 depictedthe box graph representation of the original and weighted PDdataset with the whole 22 features Figure 5 showed the distri-bution of two classes of the original and weighted 195 samplesformed by the best three principle components obtained withthe principle component analysis (PCA) algorithm [31] FromFigures 4 and 5 it can be seen that the discriminative abilityof the original PD dataset has been improved substantiallyby SCFW approach After data preprocessing stage theclassification algorithms have been used and discriminatedthe weighted PD dataset


0

100

200

300

400

500

600

Number of features

The box graph of the original PD datasetVa

lue o

f use

d fe

atur

e

1 3 5 7 9 11 13 15 17 19 21

(a)

0

2

4

6

8

10

12

The box graph of the weighted PD dataset by SCFW approach

Valu

e of u

sed

feat

ure

Number of features1 3 5 7 9 11 13 15 17 19 21

times104

(b)

Figure 4 The box graph representation of the original and weighted PD dataset

minus2minus1

01

23

minus1minus05

005

1minus1

minus05

0

05

1

1st principal component

2nd principal component

3rd

prin

cipa

l com

pone

nt

HealthyPD

(a)

minus20

24

minus10

12

minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(b)

Figure 5 Three-dimensional distribution of two classes in the original and weighted feature space by the best three principle componentsobtained with PCA method

The detailed results obtained by SCFW-KELM with fourtypes of different kernel functions were presented in Table 5As seen from Table 5 all these best results were much higherthan the ones obtained in the original feature space withoutSCFW The classification performance in the PD dataset hassignificantly improved by using SCFW method Comparedwith KELM with RBF kernel function in the original featurespace KELM with RBF kernel based on SCFW methodincreased the performance by 36 365 367 and 365in terms of ACC sensitivity specificity and AUC and hasobtained highest 119891-measure value of 09966 and highest KSvalue of 09863TheKELMmodelswith the other three kernelfunctions also have got great improvements in terms of sixperformance metrics

Table 6 also presented the comparison results of the con-fusion matrices obtained by SCFW-KELM and KELM Asseen from Table 6 SFCW-KELM correctly classified 194normal cases out of 195 total normal cases and misclassifiedonly one patient with PD as a healthy person while KELMwithout SCFW method only correctly classified 187 normalcases out of 195 total normal cases andmisclassified 6 patientswith PD as healthy persons and 2 healthy persons as patientswith PD

For SVM classifier we have performed SVM classifierwith RBF kernel It is known that the performance of SVM issensitive to the combination of the penalty parameter 119862 andthe kernel parameter 120574 Thus the best combination of (119862 120574)


Table 5 Results of SCFW-KELM with different types of kernelfunctions in the PD dataset


RBF kernel



Wav kernel



Lin kernel



Poly kernel



Table 6 Confusion matrix of KELM with RBF kernel function inthe original and weighted PD dataset

Method Expected output Prediction output

KELM Patients with PD 141 6Healthy persons 2 46

SCFW-KELM Patients with PD 146 1Healthy persons 0 48

needs to select in the classification tasks Instead of man-ually setting the parameters (119862 120574) of SVM the grid-searchtechnique [32] was adopted using 10-fold CV to find out thebest parameter values The range of the related parameters119862 and 120574 was varied between 119862 = [2minus15 2minus14 211] and120574 = [2minus15 2minus14 25] The combinations of (119862 120574) were triedand the one with the best classification accuracy was chosenas the parameter values of RBF kernel for training model

For original ELM we know that the classification perfor-mance of ELM with sigmoid additive function is sensitive tothe number of hidden neurons 119871 so value of 119871 needs to bespecified by users Figure 6 presented the detailed results ofELM in the original and weighted PD dataset with differenthidden neurons ranging from 1 to 50 Specifically the average

results of 10 runs of 10-fold CV for every specified neuronwere recorded As shown in Figure 6 the classification ratesof ELM were improved with hidden neuron increasing atfirst and then gradually fluctuated In the original dataset itachieved highestmean classification accuracy with 40 hiddenneurons while in the weighted dataset with SCFW methodhighest mean classification accuracy was gained with only 26hidden neurons

For KNN classifier the influence of neighborhood size119896 of KNN classifier in the classification performance of thePD dataset has been investigated In this study value of119896 increased from 1 to 10 The results obtained from KNNclassifierwith different values of 119896 in the PDdataset are shownin Figure 7 From the figure we can see that the best resultshave been obtained by 1-NN classifier and the performancewas decreased with the value of 119896 increasing while the betterresults were achieved in the weighted PD dataset with SCFWmethod for 2-NN

For KELM classifier there were two parameters thepenalty parameter119862 and the kernel parameter 120574 that need tobe specified In this study we have conducted the experimentson KELM depending on the best combination of (119862 120574) bygrid-search strategyTheparameters119862 and 120574were both variedin the range of [2minus15 2minus14 215] with the step size of 1Figure 8 showed the classification accuracy surface in onerun of 10-fold CV procedure where 119909-axis and 119910-axis werelog2119862 and log2120574 respectively Each mesh node in the (119909 119910)plane of the classification accuracy represented a parametercombination and 119911-axis denoted the achieved test accuracyvalue with each parameter combination

Table 7 summarized the comprehensive results achievedfrom four classifiers and those based on SCFW method interms of ACC sensitivity specificity AUC 119891-measure andKS value over 10 runs of 10-fold CV Besides the sum ofcomputational time of training and that of testing in secondswas recorded In this table we can see that with the aid ofSCFW method all these best results were much higher thanthe ones obtained in the original feature space The SCFW-KELM model has achieved highest results of 9949 1009939 and 9969 in terms of ACC sensitivity specificityand AUC and got highest 119891-measure of 09966 and KS valueof 09863 which outperforms the other three algorithmsCompared with KELM without SCFW SCFW-KELM hasimproved the average performance by 36 365 367and 365 in terms of ACC sensitivity specificity and AUCNote that the running time of SCFW-KELM was extremelyshort which costs only 00126 seconds

In comparison with SVM SCFW-SVM has achieved theresults of 9795 9667 9871 and 976 in terms ofACC sensitivity specificity and AUC and improved the per-formance by 257 1158 004 and 572 respectivelyKNN also has significantly improved by SCFW method ForELM classifier it has achieved best results by ELM with 36hidden neurons on the original feature space while the bestperformance was achieved by SCFW-ELMwith small hiddenneurons (only 26) It meant that the combination of SCFW


0 10 20 30 40 50074

076

078

08

082

084

086

088

09

Number of hidden neurons

The c

lass

ifica

tion

accu

racy

()

(a) Effects of 119871 in ELM in original dataset

0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()

(b) Effects of 119871 in ELM in weighted dataset

Figure 6 The effects of hidden neurons in original ELM in the classification of the original and weighted PD dataset

1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k

(a) Effects of 119896 in KNN in original dataset

1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k

(b) Effects of 119896 in KNN in weighted dataset

Figure 7 The effects of 119896 in KNN in the classification of the original and weighted PD dataset

minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C

(a) Effects of (119862 120574) in original dataset

minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C

(b) Effects of (119862 120574) in weighted dataset

Figure 8 Test accuracy surface with parameters in KELM in the original and weighted PD dataset


Table 7 The results obtained from four algorithms in the originaland weighted PD dataset

Methods Performancemetrics

Original featurespace withoutSCFWmethod

Weightedfeature spacewith SCFWmethod

KELM-RBF

ACC () 9589 plusmn 466 9949 plusmn 115Sensitivity () 9635 plusmn 519 100 plusmn 0Specificity () 9572 plusmn 593 9939 plusmn 136

AUC () 9604 plusmn 406 9969 plusmn 068119891-measure 09724 09966Kappa 08925 09863Time (s) 000435 00126

SVM



KNN



ELM


AUC () 8364 plusmn 906 9648 plusmn 436119891-measure 8364 plusmn 906 09863Kappa 07078 09447Time (s) 11437 12207

and ELM not only significantly improved the performancebut also compacted the network structure of ELMMoreoverthe sensitive results of SVM and ELM were significantlyimproved by 1158 and 2184 respectively Whatever inthe original or weighted feature space KELM with RBFkernel was much superior to the other three models bya large percentage in terms of ACC sensitivity specificityAUC 119891-measure and KS value Although SVM achieved thespecificity of 9867 the sensitivity AUC 119891-measure andKS value were lower than those of KELM with RBF kernelWe can also see that the performance of KELM with RBFkernel was much higher than those of ELM with sigmoidfunction The reason may lie in the fact that the relationbetween class labels and features in the PD dataset is linearlynonseparable kernel-based strategy works better for thiscase by transforming from linearly nonseparable to linearlyseparable dataset However the performances obtained by

SCFW-SVM approach were close to those of SCFW-KNNIt meant that after data preprocessing SVM can achieve thesame ability to discriminate the PD dataset as that of KNN

Additionally it is interesting to find that the standarddeviation of SCFW-KELM was much lower than that ofKELM and it had the smallest SD in all of the modelswhich meant SCFW-KELM became more robust and reliableby means of SCFW method In addition the reason whySCFW method outperforms FCM is that SCFW may bemore suitable for nonlinear separable datasets It considersthe density measure of data points to reduce the influenceof outliers however FCM tends to select outliers as initialcenters

For comparison purpose the classification accuraciesachieved by previous methods which researched the PDdiagnosis problemwere presented in Table 8 As shown in thetable our developed method can obtain better classificationresults than all available methods proposed in previousstudies

Experiment 2 (classification in two other benchmark data-sets) Besides the PD dataset two benchmark datasets thatis Cleveland Heart and Wisconsin Diagnostic Breast Cancer(WDBC) datasets from the UCI machine learning reposi-tory have been used to further evaluate the efficiency andeffectiveness of the proposed methodWe used the same flowas in the PD dataset for the experiments of two datasets Theweighted features space of datasets was constructed usingSCFWand then theweighted featureswere evaluatedwith thefour mentioned algorithms It will only give the classificationresults of four algorithms for the sake of convenience Table 9showed the obtained results in the classification of theoriginal and weighted Cleveland Heart dataset by SCFW-KELM model Table 10 presented the achieved results in theclassification of the original and weighted WDBC datasetusing SCFW-KELM model As seen from these results theproposed method also has achieved excellent results Itindicated the generality of the proposed method

6 Conclusions and Future Work

In this work we have developed a new hybrid diagnosismethod for addressing the PD problem The main noveltyof this paper lies in the proposed approach the combinationof SCFW method and KELM with different types of kernelfunctions allows the detection of PD in an efficient andfast manner Experiments results have demonstrated that theproposed system performed significantly well in discrimi-nating the patients with PD and healthy ones Meanwhilethe comparative results are conducted among KELM SVMKNN and ELM The experiment results have shown thatthe SCFW-KELMmethod performs advantageously over theother three methods in terms of ACC sensitivity specificityAUC 119891-measure and kappa statistic value In addition theproposed system outperforms the existingmethods proposedin the literature Based on the empirical analysis it indicatesthat the proposed method can be used as a promisingalternative tool in medical decisionmaking for PD diagnosis


Table 8 Classification accuracies achieved with our method and other methods

Study Method Accuracy ()Little et al [6] Preselection filter + exhaustive search + SVM 9140 (bootstrap with 50 replicates)Shahbaba and Neal [7] Dirichlet process mixtures 8770 (5-fold CV)Das [8] ANN 9290 (hold out)Sakar and Kursun [9] Mutual information + SVM 9275 (bootstrap with 50 replicates)Psorakis et al [10] Improved mRVMs 8947 (10-fold CV)Guo et al [11] GP-EM 9310 (10-fold CV)Luukka [12] Fuzzy entropy measures + similarity 8503 (hold out)Ozcift and Gulten [14] CFS-RF 8710 (10-fold CV)Li et al [13] Fuzzy-based nonlinear transformation + SVM 9347 (hold out)Astrom and Koker [15] Parallel NN 9120 (hold out)Spadoto et al [16] PSO + OPFHarmony search + OPFGravitational search + OPF 7353 (hold out) 8401 (hold out) 8401 (hold out)Daliri [19] SVM with chi-square distance kernel 9120 (50-50 training-testing)Polat [17] FCMFW + KNN 9793 (50-50 training-testing)Chen et al [18] PCA-FKNN 9607 (average 10-fold CV)Zuo et al [20] PSO-FKNN 9747 (10-fold CV)This study SCFW-KELM 9949 (10-fold CV)

Table 9 Results of SCFW-KELM with different types of kernelfunctions in Cleveland heart dataset


RBF kernel



Wav kernel



Lin kernel



Poly kernel



Table 10 Results of SCFW-KELM with different types of kernelfunctions in WDBC dataset


RBF kernel



Wav kernel



Lin kernel



Poly kernel




The future investigationwill paymuch attention to evaluatingthe proposed method in other medical diagnosis problems

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This research is supported by the Natural Science Founda-tion of China (NSFC) under Grant nos 61170092 6113301161272208 61103091 61202308 and 61303113 This research isalso supported by the open project program of Key Labora-tory of Symbolic Computation and Knowledge Engineeringof Ministry of Education Jilin University under Grant no93K172013K01

References

[1] G F Wooten L J Currie V E Bovbjerg J K Lee and J PatrieldquoAre men at greater risk for Parkinsonrsquos disease than womenrdquoJournal of Neurology Neurosurgery amp Psychiatry vol 75 no 4pp 637ndash639 2004

[2] K R Chaudhuri D G Healy and A H V Schapira ldquoNon-motor symptoms of Parkinsonrsquos disease diagnosis andmanage-mentrdquoThe Lancet Neurology vol 5 no 3 pp 235ndash245 2006

[3] A K Ho R Iansek C Marigliani J L Bradshaw and S GatesldquoSpeech impairment in a large sample of patients with Parkin-sonrsquos diseaserdquo Behavioural Neurology vol 11 no 3 pp 131ndash1371998

[4] K M Rosen R D Kent A L Delaney and J R Duffy ldquoPara-metric quantitative acoustic analysis of conversation producedby speakers with dysarthria and healthy speakersrdquo Journal ofSpeech Language and Hearing Research vol 49 no 2 pp 395ndash411 2006

[5] D A Rahn III M Chou J J Jiang and Y Zhang ldquoPhonatoryimpairment in Parkinsonrsquos disease evidence from nonlineardynamic analysis and perturbation analysisrdquo Journal of Voicevol 21 no 1 pp 64ndash71 2007

[6] M A Little P E McSharry E J Hunter J Spielman and L ORamig ldquoSuitability of dysphonia measurements for telemoni-toring of Parkinsonrsquos diseaserdquo IEEE Transactions on BiomedicalEngineering vol 56 no 4 pp 1015ndash1022 2009

[7] B Shahbaba and R Neal ldquoNonlinear models using Dirichletprocessmixturesrdquo Journal ofMachine Learning Research vol 10pp 1829ndash1850 2009

[8] R Das ldquoA comparison of multiple classification methods fordiagnosis of Parkinson diseaserdquo Expert Systems with Applica-tions vol 37 no 2 pp 1568ndash1572 2010

[9] CO Sakar andOKursun ldquoTelediagnosis of parkinsonrsquos diseaseusing measurements of dysphoniardquo Journal of Medical Systemsvol 34 no 4 pp 591ndash599 2010

[10] I Psorakis T Damoulas and M A Girolami ldquoMulticlass rele-vance vector machines sparsity and accuracyrdquo IEEE Transac-tions on Neural Networks vol 21 no 10 pp 1588ndash1598 2010

[11] P-F Guo P Bhattacharya and N Kharma ldquoAdvances indetecting Parkinsonrsquos diseaserdquo in Medical Biometrics vol 6165of Lecture Notes in Computer Science pp 306ndash314 SpringerBerlin Germany 2010

[12] P Luukka ldquoFeature selection using fuzzy entropymeasures withsimilarity classifierrdquo Expert Systems with Applications vol 38no 4 pp 4600ndash4607 2011

[13] D-C Li C-W Liu and S C Hu ldquoA fuzzy-based data trans-formation for feature extraction to increase classification per-formance with small medical data setsrdquo Artificial Intelligence inMedicine vol 52 no 1 pp 45ndash52 2011

[14] A Ozcift and A Gulten ldquoClassifier ensemble construction withrotation forest to improve medical diagnosis performance ofmachine learning algorithmsrdquoComputerMethods andProgramsin Biomedicine vol 104 no 3 pp 443ndash451 2011

[15] F Astrom and R Koker ldquoA parallel neural network approachto prediction of Parkinsonrsquos Diseaserdquo Expert Systems withApplications vol 38 no 10 pp 12470ndash12474 2011

[16] A A Spadoto R C Guido F L Carnevali A F Pagnin A XFalcao and J P Papa ldquoImproving Parkinsonrsquos disease iden-tification through evolutionary-based feature selectionrdquo inProceedings of the Annual International Conference of the IEEEEngineering in Medicine and Biology Society (EMBC rsquo11) pp7857ndash7860 Boston Mass USA August 2011

[17] K Polat ldquoClassification of Parkinsonrsquos disease using featureweighting method on the basis of fuzzy C-means clusteringrdquoInternational Journal of Systems Science vol 43 no 4 pp 597ndash609 2012

[18] H-L Chen C-C Huang X-G Yu et al ldquoAn efficient diagnosissystem for detection of Parkinsonrsquos disease using fuzzy k-nearestneighbor approachrdquo Expert Systems with Applications vol 40no 1 pp 263ndash271 2013

[19] M R Daliri ldquoChi-square distance kernel of the gaits for thediagnosis of Parkinsonrsquos diseaserdquo Biomedical Signal Processingand Control vol 8 no 1 pp 66ndash70 2013

[20] W-L Zuo Z-YWang T Liu and H-L Chen ldquoEffective detec-tion of Parkinsonrsquos disease using an adaptive fuzzy 119896-nearestneighbor approachrdquo Biomedical Signal Processing and Controlvol 8 no 4 pp 364ndash373 2013

[21] K Polat and S S Durduran ldquoSubtractive clustering attributeweighting (SCAW) to discriminate the traffic accidents onKonya-Afyonkarahisar highway in Turkey with the help of GISa case studyrdquoAdvances in Engineering Software vol 42 no 7 pp491ndash500 2011

[22] K Polat ldquoApplication of attribute weighting method based onclustering centers to discrimination of linearly non-separablemedical datasetsrdquo Journal of Medical Systems vol 36 no 4 pp2657ndash2673 2012

[23] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[24] Q Y Zhu A K Qin P N Suganthan and G B Huang ldquoEvolu-tionary extreme learningmachinerdquo Pattern Recognition vol 38no 10 pp 1759ndash1763 2005

[25] S L Chiu ldquoFuzzy model identification based on cluster estima-tionrdquo Journal of Intelligent and Fuzzy Systems vol 2 no 3 pp267ndash278 1994

[26] G-B Huang Q-Y Zhu and C-K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[27] C-C Chang and C-J Lin ldquoLIBSVM a Library for supportvector machinesrdquo ACM Transactions on Intelligent Systems andTechnology vol 2 no 3 article 27 2011


[28] R Kohavi ldquoA study of cross-validation and bootstrap for accu-racy estimation and model selectionrdquo in Proceedings of the 14thInternational Joint Conference on Artificial Intelligence (IJCAIrsquo95) pp 1137ndash1143 Montreal Canada August 1995

[29] JHuang andCX Ling ldquoUsingAUCand accuracy in evaluatinglearning algorithmsrdquo IEEETransactions on Knowledge andDataEngineering vol 17 no 3 pp 299ndash310 2005

[30] A Ben-David ldquoComparison of classification accuracy usingCohenrsquosWeightedKappardquoExpert SystemswithApplications vol34 no 2 pp 825ndash832 2008

[31] L I SmithATutorial on Principal Components Analysis vol 51Cornell University Ithaca NY USA 2002

[32] C-WHsu C-C Chang and C-J LinAPractical Guide to Sup-port Vector Classification 2003

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


MEDIATORSINFLAMMATION

of


Behavioural Neurology

EndocrinologyInternational Journal of



Disease Markers


BioMed Research International

OncologyJournal of



Oxidative Medicine and Cellular Longevity


PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of



Computational and Mathematical Methods in Medicine

OphthalmologyJournal of


Diabetes ResearchJournal of



Research and TreatmentAIDS


Gastroenterology Research and Practice


Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom


obtained by the proposed model Shahbaba and Neal [7]presented a nonlinear model based on Dirichlet mixturesfor the PD classification compared with multinomial logitmodels decision trees and SVM the classification accuracyof 877 was achieved by the proposed model Das [8] useda comparative study of neural networks (NN) DMneuralregression and decision trees for the diagnosis of PDthe experiment results had shown that the NN methodachieved the overall classification performance of 929Sakar and Kursun [9] used mutual information measure tocombine with SVM for the diagnosis of PD and achieved theclassification result of 9275 Psorakis et al [10] introducedsample selection strategies and model improvements formulticlass multikernel relevance vector machines andachieved the classification accuracy of 8947 in the PDdataset Guo et al [11] combined genetic programming andthe expectation maximization (EM) to diagnose PD in theordinary feature data and achieved the classification accuracyof 931 Luukka [12] proposed a new method which usedfuzzy entropy measures to combine with the similarityclassifier to predict PD and the mean classification of8503 was achieved Li et al [13] introduced a fuzzy-basednonlinear transformation approach together with SVM inthe PD dataset best classification accuracy of 9347 wasobtained Ozcift and Gulten [14] combined the correlationbased feature selection method with the rotation forestensemble classifier of 30 machine learning algorithms todistinguish PD the proposed model got best classificationaccuracy of 8713 Astrom and Koker [15] achieved highestclassification accuracy of 912 by using a parallel neuralnetwork model for PD diagnosis Spadoto et al [16] adoptedevolutionary based method together with the optimum-pathforest (OPF) classifier for PDdiagnosis and best classificationaccuracy of 8401 was obtained Polat [17] applied the fuzzy119862-means (FCM) clustering feature weighting (FCMFW)together with the 119896-nearest neighbor classifier for detectingPD the classification accuracy of 9793 was obtainedChen et al [18] proposed a model which used the principlecomponent analysis based feature extraction together withthe fuzzy 119896-nearest neighbor method to predict PD andachieved best classification accuracy of 9607 by theproposed model Daliri [19] presented a chi-square distancekernel-based SVM to discriminate the subjects with PDfrom the healthy control subjects using gait signals and theclassification result of 912was obtained Zuo et al [20] useda new diagnosis model based on particle swarm optimization(PSO) to strengthen the fuzzy 119896-nearest neighbor classifierfor the diagnosis of PD and the mean classification accuracyof 9747 was achieved

From theseworks it can be seen thatmost of the commonclassifiers frommachine learning community have been usedfor PD diagnosis For the nonlinear classification problemsthe data preprocessing methods such as feature weightingnormalization and feature transformation could increase theperformance of alone classifier algorithm So it is obviousthat the choice of an efficient feature preprocessing methodand an excellent classifier is of significant importance for thePD diagnosis problem Aiming at improving the efficiencyand effectiveness of the classification performance for the

diagnosis of PD in this paper an efficient features weight-ing method called subtractive clustering features weighting(SCFW) and a fast classification algorithm named kernel-based extreme learning machine (KELM) are examined TheSCFW method is used to map the features according to datadistributions in dataset and transform linearly nonseparabledataset to linearly separable dataset In this way the similardata within each feature are prone to getting together sothat the distinction between classes is increased to classifythe PD datasets correctly It is reported that SCFW methodcan help improve the discrimination abilities of classifiers inmany applications such as traffic accident analysis [21] andmedical datasets transformation [22] KELM is the improvedversion of ELM algorithm based on kernel function [23]The advantage of KELM is that only two parameters (thepenalty parameter 119862 and the kernel parameter 120574) need tobe adjusted unlike ELM which needs to specify the suitablevalues of weights and biases for improving the generalizationperformance [24] Furthermore KELMnot only trains as fastas that of ELM but also can achieve good generalization per-formanceThe objective of the proposedmethod is to explorethe performance of PD diagnosis using a two-stage hybridmodeling procedure via integrating SCFW with KELMFirstly the proposed method adopts SCFW to construct thediscriminative feature space through weighting features andthen the achieved weighted features serve as the input ofthe trained KELM classifier To evaluate the performanceof proposed hybrid method classification accuracy (ACC)sensitivity specificity AUC 119891-measure and kappa statisticvalue have been used Experimental results have shown thatthe proposed method achieves very promising results basedon proper kernel function by 10-fold cross validation (CV)

The main contributions of this paper are summarized asfollows

(1) It is the first time that we have proposed to integrateSCFW approach with KELM classifier to detect PD inan efficient and effective way

(2) In the proposed system SCFW method is employedas data preprocessing tool to strengthen the discrim-ination between classes for further improving thedistinguishing performance of KELM classifier

(3) Compared with the existing methods in previousstudies the proposed diagnostic system has achievedexcellent classification results

The rest of the paper is organized as follows Section 2offers brief background knowledge on SCFW and KELMThe detailed implementations of the diagnosis system arepresented in Section 3 In the next section the detailedexperiment design is described and Section 5 gives theexperiment results and discussions of the proposed methodFinally conclusions and recommendations for future workare summarized in Section 6





End ForEnd





119863119894 =

119899

sum119895=1

exp(minus10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817

2

(1199031198862)2) (1)




10038171003817100381710038171003817

2

(1199031198872)2) (2)


Load the PD dataset












x1

x2

xn

N1

N2

NL

1205731

1205732

120573L

Onf(x)




119891 (119909) =

119871

sum119894=1

120573119894ℎ119894 (119909) = ℎ (119909) 120573 (3)




1205731015840= H+T (4)



120573 = 119867119879(119868

119862+ 119867119867

119879)minus1

119879 (5)


119891 (119909) = ℎ120573 = ℎ (119909)119867119879(119868

119862+ 119867119867

119879)minus1

119879 (6)


ΩELM = 119867119867119879 ΩELM119894119895 = ℎ (119909119894) sdot ℎ (119909119895) = 119870 (119909119894 119909119895) (7)


119891 (119909) =[[

[

119870 (119909 1199091)

119870 (119909 119909119873)

]]

]

119879

(119868

119862+ ΩELM)

minus1

119879 (8)













clustering method



validation







KELM model

No

Yes

K = 10







End











F5 MDVP Jitter(Abs)





F15 NHR


F16 HNR


F18 D2


F20 Spread1


F21 Spread2F22 PPE



times 100


times 100


times 100







Recall = TPTP + FN


(9)


KS = 119875 (119860) minus 119875 (119864)1 minus 119875 (119864)

(10)







RBF kernel



Wav kernel



Lin kernel



Poly kernel





Number offeature


(normal case)






0

100

200

300

400

500

600

Number of features


lue o

f use

d fe

atur

e

1 3 5 7 9 11 13 15 17 19 21

(a)

0

2

4

6

8

10

12


Valu

e of u

sed

feat

ure


times104

(b)


minus2minus1

01

23

minus1minus05

005

1minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(a)

minus20

24

minus10

12

minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(b)








RBF kernel



Wav kernel



Lin kernel



Poly kernel















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of




















End ForEnd





119863119894 =

119899

sum119895=1

exp(minus10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817

2

(1199031198862)2) (1)




10038171003817100381710038171003817

2

(1199031198872)2) (2)


Load the PD dataset












x1

x2

xn

N1

N2

NL

1205731

1205732

120573L

Onf(x)




119891 (119909) =

119871

sum119894=1

120573119894ℎ119894 (119909) = ℎ (119909) 120573 (3)




1205731015840= H+T (4)



120573 = 119867119879(119868

119862+ 119867119867

119879)minus1

119879 (5)


119891 (119909) = ℎ120573 = ℎ (119909)119867119879(119868

119862+ 119867119867

119879)minus1

119879 (6)


ΩELM = 119867119867119879 ΩELM119894119895 = ℎ (119909119894) sdot ℎ (119909119895) = 119870 (119909119894 119909119895) (7)


119891 (119909) =[[

[

119870 (119909 1199091)

119870 (119909 119909119873)

]]

]

119879

(119868

119862+ ΩELM)

minus1

119879 (8)













clustering method



validation







KELM model

No

Yes

K = 10







End











F5 MDVP Jitter(Abs)





F15 NHR


F16 HNR


F18 D2


F20 Spread1


F21 Spread2F22 PPE



times 100


times 100


times 100







Recall = TPTP + FN


(9)


KS = 119875 (119860) minus 119875 (119864)1 minus 119875 (119864)

(10)







RBF kernel



Wav kernel



Lin kernel



Poly kernel





Number offeature


(normal case)






0

100

200

300

400

500

600

Number of features


lue o

f use

d fe

atur

e

1 3 5 7 9 11 13 15 17 19 21

(a)

0

2

4

6

8

10

12


Valu

e of u

sed

feat

ure


times104

(b)


minus2minus1

01

23

minus1minus05

005

1minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(a)

minus20

24

minus10

12

minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(b)








RBF kernel



Wav kernel



Lin kernel



Poly kernel















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of

















x1

x2

xn

N1

N2

NL

1205731

1205732

120573L

Onf(x)




119891 (119909) =

119871

sum119894=1

120573119894ℎ119894 (119909) = ℎ (119909) 120573 (3)




1205731015840= H+T (4)



120573 = 119867119879(119868

119862+ 119867119867

119879)minus1

119879 (5)


119891 (119909) = ℎ120573 = ℎ (119909)119867119879(119868

119862+ 119867119867

119879)minus1

119879 (6)


ΩELM = 119867119867119879 ΩELM119894119895 = ℎ (119909119894) sdot ℎ (119909119895) = 119870 (119909119894 119909119895) (7)


119891 (119909) =[[

[

119870 (119909 1199091)

119870 (119909 119909119873)

]]

]

119879

(119868

119862+ ΩELM)

minus1

119879 (8)













clustering method



validation







KELM model

No

Yes

K = 10







End











F5 MDVP Jitter(Abs)





F15 NHR


F16 HNR


F18 D2


F20 Spread1


F21 Spread2F22 PPE



times 100


times 100


times 100







Recall = TPTP + FN


(9)


KS = 119875 (119860) minus 119875 (119864)1 minus 119875 (119864)

(10)







RBF kernel



Wav kernel



Lin kernel



Poly kernel





Number offeature


(normal case)






0

100

200

300

400

500

600

Number of features


lue o

f use

d fe

atur

e

1 3 5 7 9 11 13 15 17 19 21

(a)

0

2

4

6

8

10

12


Valu

e of u

sed

feat

ure


times104

(b)


minus2minus1

01

23

minus1minus05

005

1minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(a)

minus20

24

minus10

12

minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(b)








RBF kernel



Wav kernel



Lin kernel



Poly kernel















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of





















clustering method



validation







KELM model

No

Yes

K = 10







End











F5 MDVP Jitter(Abs)





F15 NHR


F16 HNR


F18 D2


F20 Spread1


F21 Spread2F22 PPE



times 100


times 100


times 100







Recall = TPTP + FN


(9)


KS = 119875 (119860) minus 119875 (119864)1 minus 119875 (119864)

(10)







RBF kernel



Wav kernel



Lin kernel



Poly kernel





Number offeature


(normal case)






0

100

200

300

400

500

600

Number of features


lue o

f use

d fe

atur

e

1 3 5 7 9 11 13 15 17 19 21

(a)

0

2

4

6

8

10

12


Valu

e of u

sed

feat

ure


times104

(b)


minus2minus1

01

23

minus1minus05

005

1minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(a)

minus20

24

minus10

12

minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(b)








RBF kernel



Wav kernel



Lin kernel



Poly kernel















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of























F5 MDVP Jitter(Abs)





F15 NHR


F16 HNR


F18 D2


F20 Spread1


F21 Spread2F22 PPE



times 100


times 100


times 100







Recall = TPTP + FN


(9)


KS = 119875 (119860) minus 119875 (119864)1 minus 119875 (119864)

(10)







RBF kernel



Wav kernel



Lin kernel



Poly kernel





Number offeature


(normal case)






0

100

200

300

400

500

600

Number of features


lue o

f use

d fe

atur

e

1 3 5 7 9 11 13 15 17 19 21

(a)

0

2

4

6

8

10

12


Valu

e of u

sed

feat

ure


times104

(b)


minus2minus1

01

23

minus1minus05

005

1minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(a)

minus20

24

minus10

12

minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(b)








RBF kernel



Wav kernel



Lin kernel



Poly kernel















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of



















RBF kernel



Wav kernel



Lin kernel



Poly kernel





Number offeature


(normal case)






0

100

200

300

400

500

600

Number of features


lue o

f use

d fe

atur

e

1 3 5 7 9 11 13 15 17 19 21

(a)

0

2

4

6

8

10

12


Valu

e of u

sed

feat

ure


times104

(b)


minus2minus1

01

23

minus1minus05

005

1minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(a)

minus20

24

minus10

12

minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(b)








RBF kernel



Wav kernel



Lin kernel



Poly kernel















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of

















0

100

200

300

400

500

600

Number of features


lue o

f use

d fe

atur

e

1 3 5 7 9 11 13 15 17 19 21

(a)

0

2

4

6

8

10

12


Valu

e of u

sed

feat

ure


times104

(b)


minus2minus1

01

23

minus1minus05

005

1minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(a)

minus20

24

minus10

12

minus1

minus05

0

05

1



3rd

prin

cipa

l com

pone

nt

HealthyPD

(b)








RBF kernel



Wav kernel



Lin kernel



Poly kernel















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of



















RBF kernel



Wav kernel



Lin kernel



Poly kernel















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of

















0 10 20 30 40 50074

076

078

08

082

084

086

088

09


The c

lass

ifica

tion

accu

racy

()


0 10 20 30 40 50075

08

085

09

095

1


The c

lass

ifica

tion

accu

racy

()



1 2 3 4 5 6 7 8 9 10087

088

089

09

091

092

093

094

095

The c

lass

ifica

tion

accu

racy

()

Value of k


1 2 3 4 5 6 7 8 9 10094

0945

095

0955

096

0965

097

0975

098Th

e cla

ssifi

catio

n ac

cura

cy (

)

Value of k



minus20minus10

010

20

minus40minus20

0

20075

08

085

09

095

1

Accu

racy

()

log2 120574

log 2C


minus20minus10

010

20

minus40minus20

020

07

08

09

1

Accu

racy

()

log2 120574

log 2C








KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of





















KELM-RBF



SVM



KNN



ELM















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of





















RBF kernel



Wav kernel



Lin kernel



Poly kernel





RBF kernel



Wav kernel



Lin kernel



Poly kernel







Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of




















Acknowledgments


References







































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of



























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of
















Research Article An Efficient Diagnosis System for Parkinson ...accuracy of. %. Astr om and Koker...

Documents

Transcript of Research Article An Efficient Diagnosis System for Parkinson ...accuracy of. %. Astr om and Koker...