ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based...

11
Research Article Improved Estimation of Parkinsonian Vowel Quality through Acoustic Feature Assimilation Amr Gaballah , 1 Vijay Parsa , 1,2 Daryn Cushnie-Sparrow , 2 and Scott Adams 2 1 Department of Electrical and Computer Engineering, University of Western Ontario, London, ON, Canada 2 School of Communication Sciences and Disorders, University of Western Ontario, London, ON, Canada Correspondence should be addressed to Amr Gaballah; [email protected] Received 9 June 2020; Revised 17 October 2020; Accepted 30 June 2021; Published 15 July 2021 Academic Editor: Noureddin Nakhostin Ansari Copyright © 2021 Amr Gaballah et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper investigated the performance of a number of acoustic measures, both individually and in combination, in predicting the perceived quality of sustained vowels produced by people impaired with Parkinson’s disease (PD). Sustained vowel recordings were collected from 51 PD patients before and after the administration of the Levodopa medication. Subjective ratings of the overall vowel quality were garnered using a visual analog scale. ese ratings served to benchmark the effectiveness of the acoustic measures. Acoustic predictors of the perceived vowel quality included the harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPP), recurrence period density entropy (RPDE), Gammatone frequency cepstral coefficients (GFCCs), linear prediction (LP) coefficients and their variants, and modulation spectrogram features. Linear regression (LR) and support vector regression (SVR) models were employed to assimilate multiple features. Different feature dimensionality reduction methods were investigated to avoid model overfitting and enhance the prediction capabilities for the test dataset. Results showed that the RPDE measure performed the best among all individual features, while a regression model incorporating a subset of features produced the best overall correlation of 0.80 between the predicted and actual vowel quality ratings. is model may therefore serve as a surrogate for auditory-perceptual assessment of Parkinsonian vowel quality. Furthermore, the model may offer the clinician a tool to predict who may benefit from Levodopa medication in terms of enhanced voice quality. 1. Introduction Parkinson’s disease (PD) is the second most common neurodegenerative disease, after Alzheimer’s disease [1]. Pathological symptoms of PD are severe loss of dopami- nergic neurons in the nigrostriatal region and the appear- ance of cytoplasmic inclusions known as Lewy bodies (LBs) [2, 3]. A reduction in dopamine production leads to the appearance of rest tremors, akinesia, cogwheel rigidity, and postural instability. In addition, statistics show that nearly 90% of people impaired with PD develop voice and speech disorders during the course of their disease [4, 5]. e classic characteristics of Parkinsonian speech and voice include reduced vocal loudness (hypophonia), with a tendency of the voice to fade out; reduced prosodic pitch inflection (hypoprosodia); breathy or hoarse voice; imprecise articu- lation of consonants and vowels; and mumbled speech [4, 5]. While speech articulation and fluency problems appear at later stages of PD, voice abnormalities may appear at earlier stages of the course of the disease [5]. As such, assessment of voice characteristics of PD patients forms a critical part of their treatment and rehabilitation processes. For medical treatment, Levodopa is the most commonly used medication that has been shown to improve the motor symptoms of the disease [6]. Levodopa crosses the blood- brain barrier and increases the production of dopamine. is process reduces the effect of the dopamine production drop caused by PD and enhances the efficiency of the motor features of the PD subject [6]. Despite its therapeutic effects in the treatment of motor deficits of PD, Levodopa does not have the same healing effect on PD voice. In general, the magnitude, consistency, and long-term effects of Levodopa are far from satisfactory for voice rehabilitation in PD pa- tients [5, 7]. For example, Cushnie-Sparrow et al. [8] recently Hindawi e Scientific World Journal Volume 2021, Article ID 6076828, 11 pages https://doi.org/10.1155/2021/6076828

Transcript of ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based...

Page 1: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

Research ArticleImproved Estimation of Parkinsonian Vowel Quality throughAcoustic Feature Assimilation

Amr Gaballah 1 Vijay Parsa 12 Daryn Cushnie-Sparrow 2 and Scott Adams 2

1Department of Electrical and Computer Engineering University of Western Ontario London ON Canada2School of Communication Sciences and Disorders University of Western Ontario London ON Canada

Correspondence should be addressed to Amr Gaballah agaballauwoca

Received 9 June 2020 Revised 17 October 2020 Accepted 30 June 2021 Published 15 July 2021

Academic Editor Noureddin Nakhostin Ansari

Copyright copy 2021 Amr Gaballah et al)is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

)is paper investigated the performance of a number of acoustic measures both individually and in combination in predictingthe perceived quality of sustained vowels produced by people impaired with Parkinsonrsquos disease (PD) Sustained vowel recordingswere collected from 51 PD patients before and after the administration of the Levodopa medication Subjective ratings of theoverall vowel quality were garnered using a visual analog scale )ese ratings served to benchmark the effectiveness of the acousticmeasures Acoustic predictors of the perceived vowel quality included the harmonics-to-noise ratio (HNR) smoothed cepstralpeak prominence (CPP) recurrence period density entropy (RPDE) Gammatone frequency cepstral coefficients (GFCCs) linearprediction (LP) coefficients and their variants and modulation spectrogram features Linear regression (LR) and support vectorregression (SVR) models were employed to assimilate multiple features Different feature dimensionality reduction methods wereinvestigated to avoid model overfitting and enhance the prediction capabilities for the test dataset Results showed that the RPDEmeasure performed the best among all individual features while a regression model incorporating a subset of features producedthe best overall correlation of 080 between the predicted and actual vowel quality ratings )is model may therefore serve as asurrogate for auditory-perceptual assessment of Parkinsonian vowel quality Furthermore the model may offer the clinician a toolto predict who may benefit from Levodopa medication in terms of enhanced voice quality

1 Introduction

Parkinsonrsquos disease (PD) is the second most commonneurodegenerative disease after Alzheimerrsquos disease [1]Pathological symptoms of PD are severe loss of dopami-nergic neurons in the nigrostriatal region and the appear-ance of cytoplasmic inclusions known as Lewy bodies (LBs)[2 3] A reduction in dopamine production leads to theappearance of rest tremors akinesia cogwheel rigidity andpostural instability In addition statistics show that nearly90 of people impaired with PD develop voice and speechdisorders during the course of their disease [4 5] )e classiccharacteristics of Parkinsonian speech and voice includereduced vocal loudness (hypophonia) with a tendency of thevoice to fade out reduced prosodic pitch inflection(hypoprosodia) breathy or hoarse voice imprecise articu-lation of consonants and vowels andmumbled speech [4 5]

While speech articulation and fluency problems appear atlater stages of PD voice abnormalities may appear at earlierstages of the course of the disease [5] As such assessment ofvoice characteristics of PD patients forms a critical part oftheir treatment and rehabilitation processes

For medical treatment Levodopa is the most commonlyused medication that has been shown to improve the motorsymptoms of the disease [6] Levodopa crosses the blood-brain barrier and increases the production of dopamine)isprocess reduces the effect of the dopamine production dropcaused by PD and enhances the efficiency of the motorfeatures of the PD subject [6] Despite its therapeutic effectsin the treatment of motor deficits of PD Levodopa does nothave the same healing effect on PD voice In general themagnitude consistency and long-term effects of Levodopaare far from satisfactory for voice rehabilitation in PD pa-tients [5 7] For example Cushnie-Sparrow et al [8] recently

Hindawie Scientific World JournalVolume 2021 Article ID 6076828 11 pageshttpsdoiorg10115520216076828

investigated the effect of the Levodopa on the perceivedquality of vowels produced by PD patients Vowel sampleswere collected from 51 PD subjects before and after Levo-dopa administration and these samples were rated by threelisteners Results showed no statistically significant differ-ence in the perceived quality of the vowels produced by PDpatients before and after taking the Levodopa A more in-depth analysis revealed an interesting fact there was astatistically significant improvement in the vowel qualitywith the administration of Levodopa for those PD patientswhose off-medication vowel quality was rated as poor )isfinding motivates the need for the assessment of vowelsamples from PD patients so that the potential benefit fromLevodopa medication on their voice quality can beestimated

While subjective assessment is considered the goldstandard of voice quality evaluation it is not efficient interms of time and cost especially when multiple voicesamples need to be rated by a group of listeners [9] )isweighs in favour of objective instrumental assessment ofvoice quality [9] Traditional acoustic characterization ofvowel samples includes jitter shimmer harmonics-to-noiseratio (HNR) and cepstral peak prominence (CPP) Jitter isdefined as the cycle-to-cycle variation of the fundamentalfrequency while the relative jitter is the ratio between theabsolute jitter and the average fundamental frequency [10]Shimmer is defined as the variability of the peak-to-peakamplitude in decibels while relative shimmer is the ratiobetween the absolute shimmer and the average amplitude[10] HNR quantifies the relationship between the periodiccomponents (harmonics) and the aperiodic components(noise) of the signal [8] Finally CPP is defined as thedifference between the peak of the cepstrum and its linearregression function [11]

Few studies investigated the performance of traditionalacoustic measures in predicting the perceived quality of PDvowels Jannetts et al [12] collected recordings of the sus-tained phonation of a a sentence and normal conversationfrom 43 speakers with PD and 10 participants with ataxia)e recordings were rated subjectively using the GRBASscale (G the grade or overall dysphonia severity Rroughness B breathiness A asthenia or weakness and Sstrain) Among the aforementioned traditional acousticmeasures CPP was found to produce the highest Spearmanrsquosrank-order correlation with ldquoGraderdquo voice quality attribute

Cushnie-Sparrow et al [8] investigated the effect of theLevodopa medication on the perceived quality of vowelsproduced by PD patients Sustained vowel recordings werecollected from 51 subjects impaired with PD in addition to11 healthy control individuals Measured acoustic metricsincluded jitter shimmer HNR CPP and the acoustic voicequality index (AVQI) Measurement of the perceived qualitywas obtained from a panel of 3 listeners )e authors foundthat the HNR resulted in the highest Pearsonrsquos correlationcoefficient of 055 with the averaged subjective qualityscores

In addition to these traditional measures a multitude ofother features has been extracted from vowel samples basedon nonlinear dynamic filterbank and spectrotemporal

modulation analyses (eg [13 14]) Examples of such fea-tures include the recurrence period density entropy (RPDE)Mel and Gammatone frequency cepstral coefficients(MFCCs and GFCCs) linear prediction (LP) based featuresand modulation spectrogram features In addition thesefeatures often need to be combined through a linear ornonlinear regressionmodel so that a single index of disorderseverity may be obtained A vast majority of past studiesinvestigating these features and feature mapping modelsfocused on their effectiveness in discriminating betweennormal and PD vowel samples (eg [14]) or in predicting theUnified Parkinsonrsquos Disease Rating Scale (UPDRS) [13] Tothe best of our knowledge no study has investigated theapplication of these features and their regression models inpredicting the perceived quality of PD vowels

)is work therefore aims to build a valid regressionmodel that assimilates relevant acoustic features for im-proved estimation of the perceived quality of PD vowelsUsing the previously collected subjective database byCushnie-Sparrow et al [8] the performance of severalacoustic features was assessed both individually and incombination Multivariate linear regression and supportvector regression techniques were utilized to assimilate thefeature sets both with and without feature reductiontechniques A final composite objective index was developedthat produced a statistically significant improvement inpredicting the perceived PD vowel quality ratings

2 Methods

21 Voice Recordings and Subjective Evaluation As men-tioned earlier subjective data collected by Cushnie-Sparrowet al [8] were used to develop and benchmark the perfor-mance of the objective metrics A brief description of thesubjective data collection procedure is given here for the sakeof completion Samples of the sustained vowel lsquoahrsquo werecollected from 51 PD subjects Salient demographic dataincludes (a) 39 male and 12 female subjects (b) age range of47 to 82 years (M 6578 SD 419) (c) diagnosis durationrange of 2 to 16 years (M 922 SD 419) and (d) Levo-dopa use duration range of 2 to 16 years (M 751 SD 391)[8]

PD patients were evaluated off and on the Levodopamedication In addition sustained vowel recordings werealso collected from 11 subjects who were nonimpaired withPD these recordings served as a control of the measurementprocess All recordings were collected using a high-qualityheadset microphone (DPA 4060) at a 44100Hz samplingrate and 16 bitssample quantization )e headset micro-phone was placed 6 cm horizontal from the middle of theupper lip philtrum and to the side of the mouth (approx-imately 45 degrees) Speech intensity was calibrated using a70 dB SPL reference and a sound level meter positioned15 cm from themouth of the PD patient A total of 113 vowelrecordings were collected through this procedure Two-second samples from the middle of each vowel recordingwere extracted for analysis and perceptual judgments ofeach segment were provided by 3 listeners (graduate stu-dents in the Speech Language Pathology program atWestern

2 )e Scientific World Journal

University))e listeners were instructed to judge the overallvoice quality of each vowel sample using a visual-analoguescale (VAS))e scale was 10 cm in length and the endpointdescriptors were ldquopoor voice qualityrdquo on the left and ldquobettervoice qualityrdquo on the right VAS score was recorded as thedistance from the left endpoint to the listenersrsquo mark )eaverage of the three listener ratings served as the overallquality rating of the vowel recordings To test intraraterreliability 20 of the vowel samples were randomly selectedand inserted into the presentation order in a randomfashion for a rerating More details can be found in [8]

22 Features and (eir Computation Subjective ratingsobtained through the procedure outlined in the previoussection were used to benchmark the performance of theobjective measures Prior to feature extraction the sustainedvowel recordings were decimated to a 16 kHz sample rate Inaddition the complete 113-sample database was divided into2 datasets )e first dataset contained 80 of the wholedataset or 91 samples to constitute the training dataset whilethe remaining 20 of the data or 22 voice samples made upthe test dataset

221 Filterbank-Based Features GFCC coefficients aremainly used in computational auditory sense analysis(CASA) studies to transform signals into time-frequency (T-F) domain to perform robust speech recognition [3 15] )erecorded signal was segmented into frames of 256 sampleswith a frame overlap of 100 samples Afterwards the powerspectrum of each frame was obtained after multiplying witha Hamming window )e equivalent rectangular bandwidth(ERB) filterbank was applied to the frame power spectra Inthis research 128 filters constituted the ERB filterbank andthe log filterbank energies were decorrelated using thediscrete cosine transform (DCT) [15] )e frame averagedGFCCs and their first-order time differences (ldquodeltardquo values)resulted in the final GFCC feature set that contained 60features

222 Modulation-Based Features Two features extractedfrom the envelope namely the speech-to-reverberationmodulation energy ratio (SRMR) and modulation area(ModA) served as the modulation-based features [9] )eenvelope of the waveform was extracted and filtered to anumber of filters [16 17] )e ratios of the energy in the lowband filters which are assumed to contain the speechmodulation energies and the high band filters which areassumed to contain the noise modulation energies representthe quality of the sustained vowel signal

In SRMR [16] the speech signal was processed through a23-channel Gammatone filterbank with center frequenciesranging from 125Hz to half the sampling rate Hilberttransform was then applied to the filterbank outputs toextract the temporal envelope in each channel )ese en-velopes had frequencies that ranged between 0 and 128HzAt this point each envelope was filtered into eight over-lapping modulation bands with center frequencies ranging

from 4 to 128Hz Finally SRMR was computed as a ratiobetween the energy stored in the first four filters whichcontain most of the speech energy and the last four filterswhich contain the background noise [16]

In ModA the speech signal was decomposed using only4 bandpass filters and filtered signals had Hilbert transformapplied to derive the band-specific temporal envelopes Eachenvelope was subsequently downsampled to 20Hz and thenprocessed through 13 octave filterbank with center fre-quencies ranging between 05 and 8Hz )e filterbankoutput energies were then used to derive the area under eachacoustic band and then those areas are averaged to producethe ModA metric [17]

223 Linear Prediction-Based Features )e LP-based fea-ture extraction methodology is presented in Low ComplexityQuality Assessment (LCQA) proposed by Grancharov et al[18] )e central idea of LCQA is to extract statistical featuresof the speech signal [18] Each speech recording was seg-mented into 20ms nonoverlapping frames an 18th order LPmodel was computed for each frame and a vector of featuresis extracted from each frame )is featuresrsquo vector incorpo-rates 10 features which are the spectral flatness the excitationvariance the signal variance the spectral centroid and thespectral dynamics for each frame in addition to the firstderivative of each of the aforementioned features [19] At thispoint the statistical properties of each one of the 10 featuresare calculated across all the frames these statistical featuresinclude mean variance skew and kurtosis [18] )is yields tothe formation of a vector of size 40 times 1 for each speech signal[15 19]

224 Recurrence Period Density Entropy (RPDE) In RPDEmeasurements the signal was first applied to a time delayembedding to recreate the phase space of a nonlinear dy-namic system [20] RPDE quantifies the percentage of thedynamics in the reconstructed phase space that are periodicor repeated exactly [20 21] Recurrence time (T) is the timethat the recurrent signal takes to turn back to the same point[20] It was previously shown that the deviation from theentropy calculated by the entropy H of the distribution ofthese recurrence periods is a good indication of general voicedisorders [20] RPDE has been used in [20] to classifydisordered voice and normal voice and its accuracy reached91)ese results led to incorporating RPDE in this study toassess the quality of Parkinsonian sustained vowels )eRPDE was computed using the voice analysis toolbox basedon the research by Tsanas et al [13]

225 Traditional Acoustic Measures )e traditionalacoustic measures of percent jitter absolute shimmer HNRand CPP was computed from the sustained vowel recordsusing the Praat software package (version 60) [22] )erecords were analyzed using a custom Praat script and thetraditional acoustic measures were extracted from the reportof voice characteristics returned by the script

)e Scientific World Journal 3

23 Feature Mapping While RPDE HNR and CPP aresingle numbers that represent the predicted speech qualitythe GFCC and LCQA are multidimensional feature vectorsMapping algorithms aim to generate a function that as-similates the multidimensional feature vectors to match thesubjective scores To express this mathematically we have[23]

y f(θX) + b (1)

where θ represents the parameters and functions associatedwith the feature mapper X is the feature matrix that has sizem times n m is the number of training samples n is the size ofthe feature vector y are the subjective scores correspondingto the training samples and b is the prediction errorCommonly used feature mappers include multivariate linearregression (LR) and support vector regression (SVR) [24]

24 Principal Component Analysis (PCA) PCA is used toreduce the dimensionality of the input features of the ma-chine learning algorithm and enhance the interpretation ofthe features [25] )is dimensionality reduction or featurereduction has to be done in a way that maintains the in-formation contained in the input features [25] PCA utilizesthe eigenvalues and the eigenvectors to come up with newfeatures that have smaller dimensionality but maximizes thevariance of the dataset [25] More details about PCA can befound in [25]

25 Feature Selection and Reduction A higher dimension-ality of the feature vector may cause overfitting In suchsituations extracted numbers of features for each metricmust be reduced before applying the machine learning al-gorithm to avoid overfitting To accomplish this goal thecorrelation between each single feature and the subjectivescores was obtained and then the features were rearrangedaccording to their correlation values from the highest to thelowest Subsequently a Monte Carlo algorithm was appliedto extract the maximum number of features that minimizedthe cost function for both the training and the test datasets)is algorithm took the rearranged featuresrsquo matrix and thesubjective scores vector as two inputs [26] At this point thedata was split into a training dataset and test dataset wherethe training dataset contained 80 of the full data while thetest dataset contained the remaining 20 of the dataset )ealgorithm applied linear regression to a subset of the datasetsto find which subset achieved the minimum mean squareerror (MSE) with the subjective scores

3 Results

31 Subjective Results Intrarater reliability of the perceptualjudgment of voice quality was assessed using the intraclasscorrelation coefficient (ICC) [8] Each rater was assessedusing average agreement in a two-way mixed model )eaverage ICC across all raters [8] which is considered to bemoderate intrarater reliability Interrater reliability acrossthe 3 subjective estimators was assessed using average

consistency in a two-way random model average [8] whichcan be interpreted as good interrater reliability

Paired sample t-tests showed that there were no statis-tically significant differences between PD vowel qualityratings on and off Levodopa In other words when the PDpatient cohort was considered as a whole the PDmedicationdid not have any influence on their vowel quality An in-teresting finding does emerge however when PD group isdivided into two groups those with poor perceived voicequality and those with good perceived voice quality in theoff-medication condition )ere was a significant im-provement in perceived vowel quality for the poor-qualitygroup with the administration of medication )e differ-ences among the two groups in terms of the off-medicationvoice quality and the improvement after medication areshown in Figure 1 It can be seen that patients who have lowsustained vowel quality ratings before taking Levodopa havea high improvement in voice quality after taking themedication On the other hand people who have high voicequality ratings before taking the medication have a statis-tically insignificant change in voice quality )ese resultshighlight the need for either subjective or objective as-sessment of PD voice quality in order to predict the ef-fectiveness of Levodopa medication on voice quality

32 Objective Results Figure 2 displays the sample spec-trograms associated with sustained vowel samples collectedfrom 2 subjects in the database Figure 3(b) displays thespectrogram of the normal control subject with a highsubjective quality score )is record had a relative jitter of029 a relative shimmer of 299 dB a CPP value of 11 dBand a HNR value of 228 dB Figure 32 displays the spec-trogram of a subject impaired with PD)is subject had beenoff Levodopa medication and had a low sustained vowelquality rating )is record of the PD subject has a relativejitter of 102 a relative shimmer of 1213 dB a CPP value of155 dB and a HNR value of 1423 dB

Detailed analyses revealed that jitter and shimmer hadpoor correlation values with the subjective scores of thequality of sustained vowel records and therefore they werenot considered to be reliable objective metrics of the qualityof Parkinsonian sustained vowels

Table 1 shows (a) the correlation values between theobjective scores and the subjective perceived quality ratingsusing different metrics and (b) standard deviation of pre-diction error (SDPE) given by SDPE 1113954σs

1 minus ρ2

1113968 where 1113954σs

is the standard deviation of the subjective speech qualityscores and ρ is the correlation coefficient between the trueand predicted quality scores [27] )e statistical significanceof the ρ parameter was computed as well and correlationcoefficients with significance values plt 005 and plt 001 aredenoted by lowast and lowastlowast respectively It must be noted here thatwhile high correlation coefficients between objective andsubjective measures are desirable a big difference betweenthe correlation coefficients for training and test datasets is anindication of overfitting

(1) Individual Feature Performance )e first five ob-jective metrics SRMR ModA CPP HNR and

4 )e Scientific World Journal

0 05 1 15 2ndash025

ndash02

ndash015

ndash01

ndash005

0

005

01

015

02

Am

plitu

de

Time (secs)

(a)

ndash004

ndash003

ndash002

ndash001

0

001

002

003

004

005

Am

plitu

de

0 05 1 15 2Time (secs)

(b)

Figure 2 Continued

0

20

40

60

80

100

Off medicationOn medication

Poor Better

Aver

aged

qua

lity

scor

es

Voice quality level before medication

Figure 1 Averaged perceptual overall vowel quality scores off and onmedication for the two PD groups those whose premedication qualityscore was below a threshold (labeled as ldquopoorrdquo quality level) and those with a score above the threshold (labeled as ldquobetterrdquo quality level))ethreshold was set as the lower 95 confidence interval for the mean vowel quality rating of the control subjects

)e Scientific World Journal 5

RPDE are solo features Consequently LR was ap-plied to the training dataset which contained 80 ofthe whole database and then the trained linearmodel was applied to the test dataset to benchmarkthe objective metric performance Both SRMR andModA had low statistically insignificant correlationvalues with the quality of sustained vowels for the

training and the test datasets An explanation of thatis the envelope of the high-quality sustained voweldoes not have a lot of variations while the envelopeof the low-quality sustained vowel contains highvariations (see Figures 3(a) and 3(b) respectively))is contradicts the way SRMR and ModA measurethe quality of running speech in which the variations

0 20 40 60 8002

03

04

05

06

07

08

RPD

E

Subjective scores

(a)

0 20 40 60 80 1005

10

15

20

25

30

Subjective scores

HN

R (d

B)

(b)

Figure 3 Scatter plots of subjective scores against recurrence period density entropy (RPDE) and harmonics-to-noise ratio (HNR) for thewhole dataset )e x-axis on these plots is the averaged subjective score across the whole database while the y-axis represents the acousticmeasure )e correlation coefficient between the objective and subjective measures is presented in the text (a) RPDE (b) HNR

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(c)

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(d)

Figure 2 Waveforms and spectrograms of selected sustained vowels recordings from subjects in the database (a c) )e waveform and thespectrogram for the sustained vowel ldquo∖a∖rdquo collected from a normal control subject (b d) the waveform and the spectrogram for ldquo∖a∖rdquosustained vowel collected from a PD subject who has been off his medication (a) Control subject waveform (b) PD subject waveform(c) Control subject high quality (d) Parkinsonian subject poor quality

6 )e Scientific World Journal

Tabl

e1Correlatio

ncoeffi

cientsandstandard

deviationof

predictio

nerror(SD

PE)o

fobjectiv

emetricswith

subjectiv

escoresInd

ices

thatareb

oldedin

acolumnares

tatisticallysim

ilarin

theirperformance

buta

resig

nificantly

bette

rthan

others

inthat

column

Sign

ificanceof

correlationvalues

indicatedbylowastplt005

andlowastlowast

plt001

Metric

Fullset

PCA

Redu

ced

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SPDE

(training)

Correlatio

n(test)

SDPE

(test)

SRMR

028lowastlowast

1726

016

1774

mdashmdash

mdashmdash

mdashmdash

mdashmdash

Mod

A040lowastlowast

1648

033

1697

mdashmdash

mdashmdash

mdashmdash

mdashmdash

CPP

029lowastlowast

1720

053lowast

1524

mdashmdash

mdashmdash

mdashmdash

mdashmdash

HNR

055lowastlowast

1501

074lowastlowast

1209

mdashmdash

mdashmdash

mdashmdash

mdashmdash

RPDE

080lowastlowast

1079

075lowastlowast

1198

mdashmdash

mdashmdash

mdashmdash

mdashmdash

GFC

C-LR

086lowastlowast

915

minus025

1744

065lowastlowast

1363

042

1635

056lowastlowast

1486

055lowastlowast

1505

GFC

C-SVR

060lowastlowast

1435

006

1798

060lowastlowast

1435

030

1719

054lowastlowast

1510

052lowast

1539

LCQA-LR

082lowastlowast

1027

046lowast

1600

076lowastlowast

1166

055lowastlowast

1505

075lowastlowast

1187

075lowastlowast

1192

LCQA-SVR

066lowastlowast

1348

066lowastlowast

1354

073lowastlowast

1226

053lowast

1528

066lowastlowast

1348

066lowastlowast

1528

Com

bined-

LR087lowastlowast

885

051lowast

1550

073lowastlowast

1226

070lowastlowast

1287

081lowastlowast

1052

080lowastlowast

1081

Com

bined-

SVR

075lowastlowast

1187

077lowastlowast

1150

071lowastlowast

1263

066lowastlowast

1354

071lowastlowast

1263

082lowastlowast

1031

PCAp

rincipal

compo

nent

analysis

RPDE

recurrence

period

density

entrop

ySR

MR

speech-to-reverberationmod

ulationratio

HNR

harm

onics-to-noise

ratio

CPP

cepstralpeak

prom

inenceM

odA

mod

ulationareaG

FCCG

ammaton

efrequencycepstral

coeffi

cientsL

CQAL

owCom

plexity

QualityAssessm

entandSV

Rsupp

ortv

ectorregressio

n

)e Scientific World Journal 7

of the envelope and the ratio between energies in thelow band and the energies in the high bands of theenvelope indicate the quality of the waveform )ecorrelation between HNR and the subjective scoreswas 055 for the training database and 074 for thetest database while the correlation values betweenthe subjective measurements and the CPP scoreswere 029 and 053 for the training and the testdatasets respectivelymdashall of which were statisticallysignificant RPDE was the highest among the single-feature objective metrics to have values of correlationwith the subjective scores that reached statisticallysignificant 080 and 075 values for the training andthe test datasets respectively It is noted that there isstill a difference between the correlation values of thetraining and the test datasets of RPDE which meansthat this metric has deficiency in predicting thequality for new (ie unseen) sustained vowel sam-ples Figure 3 shows the scatter plot for the RPDEand HNR measures against the subjective scores

(2) Objective Metrics with Multiple Features )e full setmetrics are the metrics that contained multiplefeatures and the whole set of features are trainedwithout any feature reduction Using machinelearning algorithms on the GFCC features did notyield a reliable metric to estimate the Parkinsonianvoice quality while applying SVR on the LCQAfeatures resulted in a LCQA-SVR metric that has a066 correlation value with the subjective scores

(3) Reduced Multiple Feature Objective Metrics Apply-ing PCA and the Monte Carlo feature selection andreduction method to the GFCC and LCQA featuresled to a reduction of the number of dimensions of theGFCCmetric from 60 to 3 features only It also led toreducing the number of LCQA features from 40 to16 It is noted that the metrics resulting from thefeature reduction method led to higher performancethan the PCA method )is enhanced the perfor-mance of most of the metrics Applying LR to theLCQA metric led to obtaining an objective metricthat has a statistically significant 075 correlationvalue with the subjective scores

(4) A Composite Objective Voice Quality Estimator Acomposite metric was derived by augmenting theHNR and CPP features with LCQA features andapplying SVR and LR to estimate the vowel qualityscores )e combined metric which included 42features resulted in predicted quality scores that hada 077 correlation value with the subjective qualityscores )is is noteworthy in that it is higher than allthe other multiple feature metrics

Afterwards the PCA method was applied to the featuresbefore training the model to estimate the vowel quality )enumber of dimensions was reduced to 23 features whichexplained 95 of the data variance Finally applying thefeature reduction method had greater improvement of theperformance more than using PCA Applying LR to the

reduced combined feature set resulted in a model that es-timated the quality of the vowels with a statistically sig-nificant 080 correlation value with the subjective scores Totest the statistical significance between the correlation valuesof the obtained scores from the combined metric comparedto the second-highest objective metric which is the reducedLCQA metric Steigerrsquos Z-test [28] was applied to measurethe statistical difference It was found that there is a statisticalenhancement when using the combined reduced metricinstead of the reduced LCQA metric Figure 4 shows thescatter plot of the subjective voice quality scores on the x-axis against the predicted quality scores by the compositemetric on the y-axis for the training test and full data

It is noted that the composite metric had the highestcorrelation for training and test datasets followed by theRPDE feature In order to further confirm this finding boththese metrics were trained repeatedly with different trainingdatasets (different samples selected randomly from thewhole dataset each time) and then applied to the corre-sponding test dataset )en the average correlation valuesfor the training and the test datasets were calculated andthey were found to be 075 for the RPDEmethod and 080 forthe combined reducedmethod)e difference between thesetwo correlation coefficients was statistically significant in-dicating the composite measure provided a better overallperformance that was robust to random partitioning of thedatabase

4 Discussion and Conclusion

In this paper the quality of the sustained vowels producedby patients with PD was predicted through objectiveacoustical analyses A previously collected vowel databaseby Cushnie-Sparrow et al [8] was utilized for this purpose)is database consisted of sustained vowel samples from 51PD patients before and after taking the Levodopa medi-cation along with vowel samples from 11 healthy controlsubjects which resulted in the formation of a database of113 vowels A panel of 3 listeners rated the perceivedquality of these vowel recordings [8] which were used intraining the objective models and assessing their accuracyVowel quality prediction features included GFCC LCQAHNR smoothed CPP and RPDE Machine learning al-gorithms SVR and LR were applied to these multidimen-sional features to estimate the quality objectively Some ofthe features mentioned above were blended to form acomposite objective metric that displayed significantlybetter performance than the other metrics Moreover PCAand feature reduction were applied to reduce the number ofinput features to machine learning algorithms to reduce theoverfitting and enhance the performance of the objectivemetrics

Initial investigation focused on individual features andparameters extracted from the vowel samples Although anumber of these individual features exhibited statisticallysignificant correlation values with auditory-perceptual rat-ings only a subset of the measures exhibited correlationcoefficients greater than 05 Of the commonly reported

8 )e Scientific World Journal

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 2: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

investigated the effect of the Levodopa on the perceivedquality of vowels produced by PD patients Vowel sampleswere collected from 51 PD subjects before and after Levo-dopa administration and these samples were rated by threelisteners Results showed no statistically significant differ-ence in the perceived quality of the vowels produced by PDpatients before and after taking the Levodopa A more in-depth analysis revealed an interesting fact there was astatistically significant improvement in the vowel qualitywith the administration of Levodopa for those PD patientswhose off-medication vowel quality was rated as poor )isfinding motivates the need for the assessment of vowelsamples from PD patients so that the potential benefit fromLevodopa medication on their voice quality can beestimated

While subjective assessment is considered the goldstandard of voice quality evaluation it is not efficient interms of time and cost especially when multiple voicesamples need to be rated by a group of listeners [9] )isweighs in favour of objective instrumental assessment ofvoice quality [9] Traditional acoustic characterization ofvowel samples includes jitter shimmer harmonics-to-noiseratio (HNR) and cepstral peak prominence (CPP) Jitter isdefined as the cycle-to-cycle variation of the fundamentalfrequency while the relative jitter is the ratio between theabsolute jitter and the average fundamental frequency [10]Shimmer is defined as the variability of the peak-to-peakamplitude in decibels while relative shimmer is the ratiobetween the absolute shimmer and the average amplitude[10] HNR quantifies the relationship between the periodiccomponents (harmonics) and the aperiodic components(noise) of the signal [8] Finally CPP is defined as thedifference between the peak of the cepstrum and its linearregression function [11]

Few studies investigated the performance of traditionalacoustic measures in predicting the perceived quality of PDvowels Jannetts et al [12] collected recordings of the sus-tained phonation of a a sentence and normal conversationfrom 43 speakers with PD and 10 participants with ataxia)e recordings were rated subjectively using the GRBASscale (G the grade or overall dysphonia severity Rroughness B breathiness A asthenia or weakness and Sstrain) Among the aforementioned traditional acousticmeasures CPP was found to produce the highest Spearmanrsquosrank-order correlation with ldquoGraderdquo voice quality attribute

Cushnie-Sparrow et al [8] investigated the effect of theLevodopa medication on the perceived quality of vowelsproduced by PD patients Sustained vowel recordings werecollected from 51 subjects impaired with PD in addition to11 healthy control individuals Measured acoustic metricsincluded jitter shimmer HNR CPP and the acoustic voicequality index (AVQI) Measurement of the perceived qualitywas obtained from a panel of 3 listeners )e authors foundthat the HNR resulted in the highest Pearsonrsquos correlationcoefficient of 055 with the averaged subjective qualityscores

In addition to these traditional measures a multitude ofother features has been extracted from vowel samples basedon nonlinear dynamic filterbank and spectrotemporal

modulation analyses (eg [13 14]) Examples of such fea-tures include the recurrence period density entropy (RPDE)Mel and Gammatone frequency cepstral coefficients(MFCCs and GFCCs) linear prediction (LP) based featuresand modulation spectrogram features In addition thesefeatures often need to be combined through a linear ornonlinear regressionmodel so that a single index of disorderseverity may be obtained A vast majority of past studiesinvestigating these features and feature mapping modelsfocused on their effectiveness in discriminating betweennormal and PD vowel samples (eg [14]) or in predicting theUnified Parkinsonrsquos Disease Rating Scale (UPDRS) [13] Tothe best of our knowledge no study has investigated theapplication of these features and their regression models inpredicting the perceived quality of PD vowels

)is work therefore aims to build a valid regressionmodel that assimilates relevant acoustic features for im-proved estimation of the perceived quality of PD vowelsUsing the previously collected subjective database byCushnie-Sparrow et al [8] the performance of severalacoustic features was assessed both individually and incombination Multivariate linear regression and supportvector regression techniques were utilized to assimilate thefeature sets both with and without feature reductiontechniques A final composite objective index was developedthat produced a statistically significant improvement inpredicting the perceived PD vowel quality ratings

2 Methods

21 Voice Recordings and Subjective Evaluation As men-tioned earlier subjective data collected by Cushnie-Sparrowet al [8] were used to develop and benchmark the perfor-mance of the objective metrics A brief description of thesubjective data collection procedure is given here for the sakeof completion Samples of the sustained vowel lsquoahrsquo werecollected from 51 PD subjects Salient demographic dataincludes (a) 39 male and 12 female subjects (b) age range of47 to 82 years (M 6578 SD 419) (c) diagnosis durationrange of 2 to 16 years (M 922 SD 419) and (d) Levo-dopa use duration range of 2 to 16 years (M 751 SD 391)[8]

PD patients were evaluated off and on the Levodopamedication In addition sustained vowel recordings werealso collected from 11 subjects who were nonimpaired withPD these recordings served as a control of the measurementprocess All recordings were collected using a high-qualityheadset microphone (DPA 4060) at a 44100Hz samplingrate and 16 bitssample quantization )e headset micro-phone was placed 6 cm horizontal from the middle of theupper lip philtrum and to the side of the mouth (approx-imately 45 degrees) Speech intensity was calibrated using a70 dB SPL reference and a sound level meter positioned15 cm from themouth of the PD patient A total of 113 vowelrecordings were collected through this procedure Two-second samples from the middle of each vowel recordingwere extracted for analysis and perceptual judgments ofeach segment were provided by 3 listeners (graduate stu-dents in the Speech Language Pathology program atWestern

2 )e Scientific World Journal

University))e listeners were instructed to judge the overallvoice quality of each vowel sample using a visual-analoguescale (VAS))e scale was 10 cm in length and the endpointdescriptors were ldquopoor voice qualityrdquo on the left and ldquobettervoice qualityrdquo on the right VAS score was recorded as thedistance from the left endpoint to the listenersrsquo mark )eaverage of the three listener ratings served as the overallquality rating of the vowel recordings To test intraraterreliability 20 of the vowel samples were randomly selectedand inserted into the presentation order in a randomfashion for a rerating More details can be found in [8]

22 Features and (eir Computation Subjective ratingsobtained through the procedure outlined in the previoussection were used to benchmark the performance of theobjective measures Prior to feature extraction the sustainedvowel recordings were decimated to a 16 kHz sample rate Inaddition the complete 113-sample database was divided into2 datasets )e first dataset contained 80 of the wholedataset or 91 samples to constitute the training dataset whilethe remaining 20 of the data or 22 voice samples made upthe test dataset

221 Filterbank-Based Features GFCC coefficients aremainly used in computational auditory sense analysis(CASA) studies to transform signals into time-frequency (T-F) domain to perform robust speech recognition [3 15] )erecorded signal was segmented into frames of 256 sampleswith a frame overlap of 100 samples Afterwards the powerspectrum of each frame was obtained after multiplying witha Hamming window )e equivalent rectangular bandwidth(ERB) filterbank was applied to the frame power spectra Inthis research 128 filters constituted the ERB filterbank andthe log filterbank energies were decorrelated using thediscrete cosine transform (DCT) [15] )e frame averagedGFCCs and their first-order time differences (ldquodeltardquo values)resulted in the final GFCC feature set that contained 60features

222 Modulation-Based Features Two features extractedfrom the envelope namely the speech-to-reverberationmodulation energy ratio (SRMR) and modulation area(ModA) served as the modulation-based features [9] )eenvelope of the waveform was extracted and filtered to anumber of filters [16 17] )e ratios of the energy in the lowband filters which are assumed to contain the speechmodulation energies and the high band filters which areassumed to contain the noise modulation energies representthe quality of the sustained vowel signal

In SRMR [16] the speech signal was processed through a23-channel Gammatone filterbank with center frequenciesranging from 125Hz to half the sampling rate Hilberttransform was then applied to the filterbank outputs toextract the temporal envelope in each channel )ese en-velopes had frequencies that ranged between 0 and 128HzAt this point each envelope was filtered into eight over-lapping modulation bands with center frequencies ranging

from 4 to 128Hz Finally SRMR was computed as a ratiobetween the energy stored in the first four filters whichcontain most of the speech energy and the last four filterswhich contain the background noise [16]

In ModA the speech signal was decomposed using only4 bandpass filters and filtered signals had Hilbert transformapplied to derive the band-specific temporal envelopes Eachenvelope was subsequently downsampled to 20Hz and thenprocessed through 13 octave filterbank with center fre-quencies ranging between 05 and 8Hz )e filterbankoutput energies were then used to derive the area under eachacoustic band and then those areas are averaged to producethe ModA metric [17]

223 Linear Prediction-Based Features )e LP-based fea-ture extraction methodology is presented in Low ComplexityQuality Assessment (LCQA) proposed by Grancharov et al[18] )e central idea of LCQA is to extract statistical featuresof the speech signal [18] Each speech recording was seg-mented into 20ms nonoverlapping frames an 18th order LPmodel was computed for each frame and a vector of featuresis extracted from each frame )is featuresrsquo vector incorpo-rates 10 features which are the spectral flatness the excitationvariance the signal variance the spectral centroid and thespectral dynamics for each frame in addition to the firstderivative of each of the aforementioned features [19] At thispoint the statistical properties of each one of the 10 featuresare calculated across all the frames these statistical featuresinclude mean variance skew and kurtosis [18] )is yields tothe formation of a vector of size 40 times 1 for each speech signal[15 19]

224 Recurrence Period Density Entropy (RPDE) In RPDEmeasurements the signal was first applied to a time delayembedding to recreate the phase space of a nonlinear dy-namic system [20] RPDE quantifies the percentage of thedynamics in the reconstructed phase space that are periodicor repeated exactly [20 21] Recurrence time (T) is the timethat the recurrent signal takes to turn back to the same point[20] It was previously shown that the deviation from theentropy calculated by the entropy H of the distribution ofthese recurrence periods is a good indication of general voicedisorders [20] RPDE has been used in [20] to classifydisordered voice and normal voice and its accuracy reached91)ese results led to incorporating RPDE in this study toassess the quality of Parkinsonian sustained vowels )eRPDE was computed using the voice analysis toolbox basedon the research by Tsanas et al [13]

225 Traditional Acoustic Measures )e traditionalacoustic measures of percent jitter absolute shimmer HNRand CPP was computed from the sustained vowel recordsusing the Praat software package (version 60) [22] )erecords were analyzed using a custom Praat script and thetraditional acoustic measures were extracted from the reportof voice characteristics returned by the script

)e Scientific World Journal 3

23 Feature Mapping While RPDE HNR and CPP aresingle numbers that represent the predicted speech qualitythe GFCC and LCQA are multidimensional feature vectorsMapping algorithms aim to generate a function that as-similates the multidimensional feature vectors to match thesubjective scores To express this mathematically we have[23]

y f(θX) + b (1)

where θ represents the parameters and functions associatedwith the feature mapper X is the feature matrix that has sizem times n m is the number of training samples n is the size ofthe feature vector y are the subjective scores correspondingto the training samples and b is the prediction errorCommonly used feature mappers include multivariate linearregression (LR) and support vector regression (SVR) [24]

24 Principal Component Analysis (PCA) PCA is used toreduce the dimensionality of the input features of the ma-chine learning algorithm and enhance the interpretation ofthe features [25] )is dimensionality reduction or featurereduction has to be done in a way that maintains the in-formation contained in the input features [25] PCA utilizesthe eigenvalues and the eigenvectors to come up with newfeatures that have smaller dimensionality but maximizes thevariance of the dataset [25] More details about PCA can befound in [25]

25 Feature Selection and Reduction A higher dimension-ality of the feature vector may cause overfitting In suchsituations extracted numbers of features for each metricmust be reduced before applying the machine learning al-gorithm to avoid overfitting To accomplish this goal thecorrelation between each single feature and the subjectivescores was obtained and then the features were rearrangedaccording to their correlation values from the highest to thelowest Subsequently a Monte Carlo algorithm was appliedto extract the maximum number of features that minimizedthe cost function for both the training and the test datasets)is algorithm took the rearranged featuresrsquo matrix and thesubjective scores vector as two inputs [26] At this point thedata was split into a training dataset and test dataset wherethe training dataset contained 80 of the full data while thetest dataset contained the remaining 20 of the dataset )ealgorithm applied linear regression to a subset of the datasetsto find which subset achieved the minimum mean squareerror (MSE) with the subjective scores

3 Results

31 Subjective Results Intrarater reliability of the perceptualjudgment of voice quality was assessed using the intraclasscorrelation coefficient (ICC) [8] Each rater was assessedusing average agreement in a two-way mixed model )eaverage ICC across all raters [8] which is considered to bemoderate intrarater reliability Interrater reliability acrossthe 3 subjective estimators was assessed using average

consistency in a two-way random model average [8] whichcan be interpreted as good interrater reliability

Paired sample t-tests showed that there were no statis-tically significant differences between PD vowel qualityratings on and off Levodopa In other words when the PDpatient cohort was considered as a whole the PDmedicationdid not have any influence on their vowel quality An in-teresting finding does emerge however when PD group isdivided into two groups those with poor perceived voicequality and those with good perceived voice quality in theoff-medication condition )ere was a significant im-provement in perceived vowel quality for the poor-qualitygroup with the administration of medication )e differ-ences among the two groups in terms of the off-medicationvoice quality and the improvement after medication areshown in Figure 1 It can be seen that patients who have lowsustained vowel quality ratings before taking Levodopa havea high improvement in voice quality after taking themedication On the other hand people who have high voicequality ratings before taking the medication have a statis-tically insignificant change in voice quality )ese resultshighlight the need for either subjective or objective as-sessment of PD voice quality in order to predict the ef-fectiveness of Levodopa medication on voice quality

32 Objective Results Figure 2 displays the sample spec-trograms associated with sustained vowel samples collectedfrom 2 subjects in the database Figure 3(b) displays thespectrogram of the normal control subject with a highsubjective quality score )is record had a relative jitter of029 a relative shimmer of 299 dB a CPP value of 11 dBand a HNR value of 228 dB Figure 32 displays the spec-trogram of a subject impaired with PD)is subject had beenoff Levodopa medication and had a low sustained vowelquality rating )is record of the PD subject has a relativejitter of 102 a relative shimmer of 1213 dB a CPP value of155 dB and a HNR value of 1423 dB

Detailed analyses revealed that jitter and shimmer hadpoor correlation values with the subjective scores of thequality of sustained vowel records and therefore they werenot considered to be reliable objective metrics of the qualityof Parkinsonian sustained vowels

Table 1 shows (a) the correlation values between theobjective scores and the subjective perceived quality ratingsusing different metrics and (b) standard deviation of pre-diction error (SDPE) given by SDPE 1113954σs

1 minus ρ2

1113968 where 1113954σs

is the standard deviation of the subjective speech qualityscores and ρ is the correlation coefficient between the trueand predicted quality scores [27] )e statistical significanceof the ρ parameter was computed as well and correlationcoefficients with significance values plt 005 and plt 001 aredenoted by lowast and lowastlowast respectively It must be noted here thatwhile high correlation coefficients between objective andsubjective measures are desirable a big difference betweenthe correlation coefficients for training and test datasets is anindication of overfitting

(1) Individual Feature Performance )e first five ob-jective metrics SRMR ModA CPP HNR and

4 )e Scientific World Journal

0 05 1 15 2ndash025

ndash02

ndash015

ndash01

ndash005

0

005

01

015

02

Am

plitu

de

Time (secs)

(a)

ndash004

ndash003

ndash002

ndash001

0

001

002

003

004

005

Am

plitu

de

0 05 1 15 2Time (secs)

(b)

Figure 2 Continued

0

20

40

60

80

100

Off medicationOn medication

Poor Better

Aver

aged

qua

lity

scor

es

Voice quality level before medication

Figure 1 Averaged perceptual overall vowel quality scores off and onmedication for the two PD groups those whose premedication qualityscore was below a threshold (labeled as ldquopoorrdquo quality level) and those with a score above the threshold (labeled as ldquobetterrdquo quality level))ethreshold was set as the lower 95 confidence interval for the mean vowel quality rating of the control subjects

)e Scientific World Journal 5

RPDE are solo features Consequently LR was ap-plied to the training dataset which contained 80 ofthe whole database and then the trained linearmodel was applied to the test dataset to benchmarkthe objective metric performance Both SRMR andModA had low statistically insignificant correlationvalues with the quality of sustained vowels for the

training and the test datasets An explanation of thatis the envelope of the high-quality sustained voweldoes not have a lot of variations while the envelopeof the low-quality sustained vowel contains highvariations (see Figures 3(a) and 3(b) respectively))is contradicts the way SRMR and ModA measurethe quality of running speech in which the variations

0 20 40 60 8002

03

04

05

06

07

08

RPD

E

Subjective scores

(a)

0 20 40 60 80 1005

10

15

20

25

30

Subjective scores

HN

R (d

B)

(b)

Figure 3 Scatter plots of subjective scores against recurrence period density entropy (RPDE) and harmonics-to-noise ratio (HNR) for thewhole dataset )e x-axis on these plots is the averaged subjective score across the whole database while the y-axis represents the acousticmeasure )e correlation coefficient between the objective and subjective measures is presented in the text (a) RPDE (b) HNR

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(c)

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(d)

Figure 2 Waveforms and spectrograms of selected sustained vowels recordings from subjects in the database (a c) )e waveform and thespectrogram for the sustained vowel ldquo∖a∖rdquo collected from a normal control subject (b d) the waveform and the spectrogram for ldquo∖a∖rdquosustained vowel collected from a PD subject who has been off his medication (a) Control subject waveform (b) PD subject waveform(c) Control subject high quality (d) Parkinsonian subject poor quality

6 )e Scientific World Journal

Tabl

e1Correlatio

ncoeffi

cientsandstandard

deviationof

predictio

nerror(SD

PE)o

fobjectiv

emetricswith

subjectiv

escoresInd

ices

thatareb

oldedin

acolumnares

tatisticallysim

ilarin

theirperformance

buta

resig

nificantly

bette

rthan

others

inthat

column

Sign

ificanceof

correlationvalues

indicatedbylowastplt005

andlowastlowast

plt001

Metric

Fullset

PCA

Redu

ced

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SPDE

(training)

Correlatio

n(test)

SDPE

(test)

SRMR

028lowastlowast

1726

016

1774

mdashmdash

mdashmdash

mdashmdash

mdashmdash

Mod

A040lowastlowast

1648

033

1697

mdashmdash

mdashmdash

mdashmdash

mdashmdash

CPP

029lowastlowast

1720

053lowast

1524

mdashmdash

mdashmdash

mdashmdash

mdashmdash

HNR

055lowastlowast

1501

074lowastlowast

1209

mdashmdash

mdashmdash

mdashmdash

mdashmdash

RPDE

080lowastlowast

1079

075lowastlowast

1198

mdashmdash

mdashmdash

mdashmdash

mdashmdash

GFC

C-LR

086lowastlowast

915

minus025

1744

065lowastlowast

1363

042

1635

056lowastlowast

1486

055lowastlowast

1505

GFC

C-SVR

060lowastlowast

1435

006

1798

060lowastlowast

1435

030

1719

054lowastlowast

1510

052lowast

1539

LCQA-LR

082lowastlowast

1027

046lowast

1600

076lowastlowast

1166

055lowastlowast

1505

075lowastlowast

1187

075lowastlowast

1192

LCQA-SVR

066lowastlowast

1348

066lowastlowast

1354

073lowastlowast

1226

053lowast

1528

066lowastlowast

1348

066lowastlowast

1528

Com

bined-

LR087lowastlowast

885

051lowast

1550

073lowastlowast

1226

070lowastlowast

1287

081lowastlowast

1052

080lowastlowast

1081

Com

bined-

SVR

075lowastlowast

1187

077lowastlowast

1150

071lowastlowast

1263

066lowastlowast

1354

071lowastlowast

1263

082lowastlowast

1031

PCAp

rincipal

compo

nent

analysis

RPDE

recurrence

period

density

entrop

ySR

MR

speech-to-reverberationmod

ulationratio

HNR

harm

onics-to-noise

ratio

CPP

cepstralpeak

prom

inenceM

odA

mod

ulationareaG

FCCG

ammaton

efrequencycepstral

coeffi

cientsL

CQAL

owCom

plexity

QualityAssessm

entandSV

Rsupp

ortv

ectorregressio

n

)e Scientific World Journal 7

of the envelope and the ratio between energies in thelow band and the energies in the high bands of theenvelope indicate the quality of the waveform )ecorrelation between HNR and the subjective scoreswas 055 for the training database and 074 for thetest database while the correlation values betweenthe subjective measurements and the CPP scoreswere 029 and 053 for the training and the testdatasets respectivelymdashall of which were statisticallysignificant RPDE was the highest among the single-feature objective metrics to have values of correlationwith the subjective scores that reached statisticallysignificant 080 and 075 values for the training andthe test datasets respectively It is noted that there isstill a difference between the correlation values of thetraining and the test datasets of RPDE which meansthat this metric has deficiency in predicting thequality for new (ie unseen) sustained vowel sam-ples Figure 3 shows the scatter plot for the RPDEand HNR measures against the subjective scores

(2) Objective Metrics with Multiple Features )e full setmetrics are the metrics that contained multiplefeatures and the whole set of features are trainedwithout any feature reduction Using machinelearning algorithms on the GFCC features did notyield a reliable metric to estimate the Parkinsonianvoice quality while applying SVR on the LCQAfeatures resulted in a LCQA-SVR metric that has a066 correlation value with the subjective scores

(3) Reduced Multiple Feature Objective Metrics Apply-ing PCA and the Monte Carlo feature selection andreduction method to the GFCC and LCQA featuresled to a reduction of the number of dimensions of theGFCCmetric from 60 to 3 features only It also led toreducing the number of LCQA features from 40 to16 It is noted that the metrics resulting from thefeature reduction method led to higher performancethan the PCA method )is enhanced the perfor-mance of most of the metrics Applying LR to theLCQA metric led to obtaining an objective metricthat has a statistically significant 075 correlationvalue with the subjective scores

(4) A Composite Objective Voice Quality Estimator Acomposite metric was derived by augmenting theHNR and CPP features with LCQA features andapplying SVR and LR to estimate the vowel qualityscores )e combined metric which included 42features resulted in predicted quality scores that hada 077 correlation value with the subjective qualityscores )is is noteworthy in that it is higher than allthe other multiple feature metrics

Afterwards the PCA method was applied to the featuresbefore training the model to estimate the vowel quality )enumber of dimensions was reduced to 23 features whichexplained 95 of the data variance Finally applying thefeature reduction method had greater improvement of theperformance more than using PCA Applying LR to the

reduced combined feature set resulted in a model that es-timated the quality of the vowels with a statistically sig-nificant 080 correlation value with the subjective scores Totest the statistical significance between the correlation valuesof the obtained scores from the combined metric comparedto the second-highest objective metric which is the reducedLCQA metric Steigerrsquos Z-test [28] was applied to measurethe statistical difference It was found that there is a statisticalenhancement when using the combined reduced metricinstead of the reduced LCQA metric Figure 4 shows thescatter plot of the subjective voice quality scores on the x-axis against the predicted quality scores by the compositemetric on the y-axis for the training test and full data

It is noted that the composite metric had the highestcorrelation for training and test datasets followed by theRPDE feature In order to further confirm this finding boththese metrics were trained repeatedly with different trainingdatasets (different samples selected randomly from thewhole dataset each time) and then applied to the corre-sponding test dataset )en the average correlation valuesfor the training and the test datasets were calculated andthey were found to be 075 for the RPDEmethod and 080 forthe combined reducedmethod)e difference between thesetwo correlation coefficients was statistically significant in-dicating the composite measure provided a better overallperformance that was robust to random partitioning of thedatabase

4 Discussion and Conclusion

In this paper the quality of the sustained vowels producedby patients with PD was predicted through objectiveacoustical analyses A previously collected vowel databaseby Cushnie-Sparrow et al [8] was utilized for this purpose)is database consisted of sustained vowel samples from 51PD patients before and after taking the Levodopa medi-cation along with vowel samples from 11 healthy controlsubjects which resulted in the formation of a database of113 vowels A panel of 3 listeners rated the perceivedquality of these vowel recordings [8] which were used intraining the objective models and assessing their accuracyVowel quality prediction features included GFCC LCQAHNR smoothed CPP and RPDE Machine learning al-gorithms SVR and LR were applied to these multidimen-sional features to estimate the quality objectively Some ofthe features mentioned above were blended to form acomposite objective metric that displayed significantlybetter performance than the other metrics Moreover PCAand feature reduction were applied to reduce the number ofinput features to machine learning algorithms to reduce theoverfitting and enhance the performance of the objectivemetrics

Initial investigation focused on individual features andparameters extracted from the vowel samples Although anumber of these individual features exhibited statisticallysignificant correlation values with auditory-perceptual rat-ings only a subset of the measures exhibited correlationcoefficients greater than 05 Of the commonly reported

8 )e Scientific World Journal

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 3: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

University))e listeners were instructed to judge the overallvoice quality of each vowel sample using a visual-analoguescale (VAS))e scale was 10 cm in length and the endpointdescriptors were ldquopoor voice qualityrdquo on the left and ldquobettervoice qualityrdquo on the right VAS score was recorded as thedistance from the left endpoint to the listenersrsquo mark )eaverage of the three listener ratings served as the overallquality rating of the vowel recordings To test intraraterreliability 20 of the vowel samples were randomly selectedand inserted into the presentation order in a randomfashion for a rerating More details can be found in [8]

22 Features and (eir Computation Subjective ratingsobtained through the procedure outlined in the previoussection were used to benchmark the performance of theobjective measures Prior to feature extraction the sustainedvowel recordings were decimated to a 16 kHz sample rate Inaddition the complete 113-sample database was divided into2 datasets )e first dataset contained 80 of the wholedataset or 91 samples to constitute the training dataset whilethe remaining 20 of the data or 22 voice samples made upthe test dataset

221 Filterbank-Based Features GFCC coefficients aremainly used in computational auditory sense analysis(CASA) studies to transform signals into time-frequency (T-F) domain to perform robust speech recognition [3 15] )erecorded signal was segmented into frames of 256 sampleswith a frame overlap of 100 samples Afterwards the powerspectrum of each frame was obtained after multiplying witha Hamming window )e equivalent rectangular bandwidth(ERB) filterbank was applied to the frame power spectra Inthis research 128 filters constituted the ERB filterbank andthe log filterbank energies were decorrelated using thediscrete cosine transform (DCT) [15] )e frame averagedGFCCs and their first-order time differences (ldquodeltardquo values)resulted in the final GFCC feature set that contained 60features

222 Modulation-Based Features Two features extractedfrom the envelope namely the speech-to-reverberationmodulation energy ratio (SRMR) and modulation area(ModA) served as the modulation-based features [9] )eenvelope of the waveform was extracted and filtered to anumber of filters [16 17] )e ratios of the energy in the lowband filters which are assumed to contain the speechmodulation energies and the high band filters which areassumed to contain the noise modulation energies representthe quality of the sustained vowel signal

In SRMR [16] the speech signal was processed through a23-channel Gammatone filterbank with center frequenciesranging from 125Hz to half the sampling rate Hilberttransform was then applied to the filterbank outputs toextract the temporal envelope in each channel )ese en-velopes had frequencies that ranged between 0 and 128HzAt this point each envelope was filtered into eight over-lapping modulation bands with center frequencies ranging

from 4 to 128Hz Finally SRMR was computed as a ratiobetween the energy stored in the first four filters whichcontain most of the speech energy and the last four filterswhich contain the background noise [16]

In ModA the speech signal was decomposed using only4 bandpass filters and filtered signals had Hilbert transformapplied to derive the band-specific temporal envelopes Eachenvelope was subsequently downsampled to 20Hz and thenprocessed through 13 octave filterbank with center fre-quencies ranging between 05 and 8Hz )e filterbankoutput energies were then used to derive the area under eachacoustic band and then those areas are averaged to producethe ModA metric [17]

223 Linear Prediction-Based Features )e LP-based fea-ture extraction methodology is presented in Low ComplexityQuality Assessment (LCQA) proposed by Grancharov et al[18] )e central idea of LCQA is to extract statistical featuresof the speech signal [18] Each speech recording was seg-mented into 20ms nonoverlapping frames an 18th order LPmodel was computed for each frame and a vector of featuresis extracted from each frame )is featuresrsquo vector incorpo-rates 10 features which are the spectral flatness the excitationvariance the signal variance the spectral centroid and thespectral dynamics for each frame in addition to the firstderivative of each of the aforementioned features [19] At thispoint the statistical properties of each one of the 10 featuresare calculated across all the frames these statistical featuresinclude mean variance skew and kurtosis [18] )is yields tothe formation of a vector of size 40 times 1 for each speech signal[15 19]

224 Recurrence Period Density Entropy (RPDE) In RPDEmeasurements the signal was first applied to a time delayembedding to recreate the phase space of a nonlinear dy-namic system [20] RPDE quantifies the percentage of thedynamics in the reconstructed phase space that are periodicor repeated exactly [20 21] Recurrence time (T) is the timethat the recurrent signal takes to turn back to the same point[20] It was previously shown that the deviation from theentropy calculated by the entropy H of the distribution ofthese recurrence periods is a good indication of general voicedisorders [20] RPDE has been used in [20] to classifydisordered voice and normal voice and its accuracy reached91)ese results led to incorporating RPDE in this study toassess the quality of Parkinsonian sustained vowels )eRPDE was computed using the voice analysis toolbox basedon the research by Tsanas et al [13]

225 Traditional Acoustic Measures )e traditionalacoustic measures of percent jitter absolute shimmer HNRand CPP was computed from the sustained vowel recordsusing the Praat software package (version 60) [22] )erecords were analyzed using a custom Praat script and thetraditional acoustic measures were extracted from the reportof voice characteristics returned by the script

)e Scientific World Journal 3

23 Feature Mapping While RPDE HNR and CPP aresingle numbers that represent the predicted speech qualitythe GFCC and LCQA are multidimensional feature vectorsMapping algorithms aim to generate a function that as-similates the multidimensional feature vectors to match thesubjective scores To express this mathematically we have[23]

y f(θX) + b (1)

where θ represents the parameters and functions associatedwith the feature mapper X is the feature matrix that has sizem times n m is the number of training samples n is the size ofthe feature vector y are the subjective scores correspondingto the training samples and b is the prediction errorCommonly used feature mappers include multivariate linearregression (LR) and support vector regression (SVR) [24]

24 Principal Component Analysis (PCA) PCA is used toreduce the dimensionality of the input features of the ma-chine learning algorithm and enhance the interpretation ofthe features [25] )is dimensionality reduction or featurereduction has to be done in a way that maintains the in-formation contained in the input features [25] PCA utilizesthe eigenvalues and the eigenvectors to come up with newfeatures that have smaller dimensionality but maximizes thevariance of the dataset [25] More details about PCA can befound in [25]

25 Feature Selection and Reduction A higher dimension-ality of the feature vector may cause overfitting In suchsituations extracted numbers of features for each metricmust be reduced before applying the machine learning al-gorithm to avoid overfitting To accomplish this goal thecorrelation between each single feature and the subjectivescores was obtained and then the features were rearrangedaccording to their correlation values from the highest to thelowest Subsequently a Monte Carlo algorithm was appliedto extract the maximum number of features that minimizedthe cost function for both the training and the test datasets)is algorithm took the rearranged featuresrsquo matrix and thesubjective scores vector as two inputs [26] At this point thedata was split into a training dataset and test dataset wherethe training dataset contained 80 of the full data while thetest dataset contained the remaining 20 of the dataset )ealgorithm applied linear regression to a subset of the datasetsto find which subset achieved the minimum mean squareerror (MSE) with the subjective scores

3 Results

31 Subjective Results Intrarater reliability of the perceptualjudgment of voice quality was assessed using the intraclasscorrelation coefficient (ICC) [8] Each rater was assessedusing average agreement in a two-way mixed model )eaverage ICC across all raters [8] which is considered to bemoderate intrarater reliability Interrater reliability acrossthe 3 subjective estimators was assessed using average

consistency in a two-way random model average [8] whichcan be interpreted as good interrater reliability

Paired sample t-tests showed that there were no statis-tically significant differences between PD vowel qualityratings on and off Levodopa In other words when the PDpatient cohort was considered as a whole the PDmedicationdid not have any influence on their vowel quality An in-teresting finding does emerge however when PD group isdivided into two groups those with poor perceived voicequality and those with good perceived voice quality in theoff-medication condition )ere was a significant im-provement in perceived vowel quality for the poor-qualitygroup with the administration of medication )e differ-ences among the two groups in terms of the off-medicationvoice quality and the improvement after medication areshown in Figure 1 It can be seen that patients who have lowsustained vowel quality ratings before taking Levodopa havea high improvement in voice quality after taking themedication On the other hand people who have high voicequality ratings before taking the medication have a statis-tically insignificant change in voice quality )ese resultshighlight the need for either subjective or objective as-sessment of PD voice quality in order to predict the ef-fectiveness of Levodopa medication on voice quality

32 Objective Results Figure 2 displays the sample spec-trograms associated with sustained vowel samples collectedfrom 2 subjects in the database Figure 3(b) displays thespectrogram of the normal control subject with a highsubjective quality score )is record had a relative jitter of029 a relative shimmer of 299 dB a CPP value of 11 dBand a HNR value of 228 dB Figure 32 displays the spec-trogram of a subject impaired with PD)is subject had beenoff Levodopa medication and had a low sustained vowelquality rating )is record of the PD subject has a relativejitter of 102 a relative shimmer of 1213 dB a CPP value of155 dB and a HNR value of 1423 dB

Detailed analyses revealed that jitter and shimmer hadpoor correlation values with the subjective scores of thequality of sustained vowel records and therefore they werenot considered to be reliable objective metrics of the qualityof Parkinsonian sustained vowels

Table 1 shows (a) the correlation values between theobjective scores and the subjective perceived quality ratingsusing different metrics and (b) standard deviation of pre-diction error (SDPE) given by SDPE 1113954σs

1 minus ρ2

1113968 where 1113954σs

is the standard deviation of the subjective speech qualityscores and ρ is the correlation coefficient between the trueand predicted quality scores [27] )e statistical significanceof the ρ parameter was computed as well and correlationcoefficients with significance values plt 005 and plt 001 aredenoted by lowast and lowastlowast respectively It must be noted here thatwhile high correlation coefficients between objective andsubjective measures are desirable a big difference betweenthe correlation coefficients for training and test datasets is anindication of overfitting

(1) Individual Feature Performance )e first five ob-jective metrics SRMR ModA CPP HNR and

4 )e Scientific World Journal

0 05 1 15 2ndash025

ndash02

ndash015

ndash01

ndash005

0

005

01

015

02

Am

plitu

de

Time (secs)

(a)

ndash004

ndash003

ndash002

ndash001

0

001

002

003

004

005

Am

plitu

de

0 05 1 15 2Time (secs)

(b)

Figure 2 Continued

0

20

40

60

80

100

Off medicationOn medication

Poor Better

Aver

aged

qua

lity

scor

es

Voice quality level before medication

Figure 1 Averaged perceptual overall vowel quality scores off and onmedication for the two PD groups those whose premedication qualityscore was below a threshold (labeled as ldquopoorrdquo quality level) and those with a score above the threshold (labeled as ldquobetterrdquo quality level))ethreshold was set as the lower 95 confidence interval for the mean vowel quality rating of the control subjects

)e Scientific World Journal 5

RPDE are solo features Consequently LR was ap-plied to the training dataset which contained 80 ofthe whole database and then the trained linearmodel was applied to the test dataset to benchmarkthe objective metric performance Both SRMR andModA had low statistically insignificant correlationvalues with the quality of sustained vowels for the

training and the test datasets An explanation of thatis the envelope of the high-quality sustained voweldoes not have a lot of variations while the envelopeof the low-quality sustained vowel contains highvariations (see Figures 3(a) and 3(b) respectively))is contradicts the way SRMR and ModA measurethe quality of running speech in which the variations

0 20 40 60 8002

03

04

05

06

07

08

RPD

E

Subjective scores

(a)

0 20 40 60 80 1005

10

15

20

25

30

Subjective scores

HN

R (d

B)

(b)

Figure 3 Scatter plots of subjective scores against recurrence period density entropy (RPDE) and harmonics-to-noise ratio (HNR) for thewhole dataset )e x-axis on these plots is the averaged subjective score across the whole database while the y-axis represents the acousticmeasure )e correlation coefficient between the objective and subjective measures is presented in the text (a) RPDE (b) HNR

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(c)

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(d)

Figure 2 Waveforms and spectrograms of selected sustained vowels recordings from subjects in the database (a c) )e waveform and thespectrogram for the sustained vowel ldquo∖a∖rdquo collected from a normal control subject (b d) the waveform and the spectrogram for ldquo∖a∖rdquosustained vowel collected from a PD subject who has been off his medication (a) Control subject waveform (b) PD subject waveform(c) Control subject high quality (d) Parkinsonian subject poor quality

6 )e Scientific World Journal

Tabl

e1Correlatio

ncoeffi

cientsandstandard

deviationof

predictio

nerror(SD

PE)o

fobjectiv

emetricswith

subjectiv

escoresInd

ices

thatareb

oldedin

acolumnares

tatisticallysim

ilarin

theirperformance

buta

resig

nificantly

bette

rthan

others

inthat

column

Sign

ificanceof

correlationvalues

indicatedbylowastplt005

andlowastlowast

plt001

Metric

Fullset

PCA

Redu

ced

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SPDE

(training)

Correlatio

n(test)

SDPE

(test)

SRMR

028lowastlowast

1726

016

1774

mdashmdash

mdashmdash

mdashmdash

mdashmdash

Mod

A040lowastlowast

1648

033

1697

mdashmdash

mdashmdash

mdashmdash

mdashmdash

CPP

029lowastlowast

1720

053lowast

1524

mdashmdash

mdashmdash

mdashmdash

mdashmdash

HNR

055lowastlowast

1501

074lowastlowast

1209

mdashmdash

mdashmdash

mdashmdash

mdashmdash

RPDE

080lowastlowast

1079

075lowastlowast

1198

mdashmdash

mdashmdash

mdashmdash

mdashmdash

GFC

C-LR

086lowastlowast

915

minus025

1744

065lowastlowast

1363

042

1635

056lowastlowast

1486

055lowastlowast

1505

GFC

C-SVR

060lowastlowast

1435

006

1798

060lowastlowast

1435

030

1719

054lowastlowast

1510

052lowast

1539

LCQA-LR

082lowastlowast

1027

046lowast

1600

076lowastlowast

1166

055lowastlowast

1505

075lowastlowast

1187

075lowastlowast

1192

LCQA-SVR

066lowastlowast

1348

066lowastlowast

1354

073lowastlowast

1226

053lowast

1528

066lowastlowast

1348

066lowastlowast

1528

Com

bined-

LR087lowastlowast

885

051lowast

1550

073lowastlowast

1226

070lowastlowast

1287

081lowastlowast

1052

080lowastlowast

1081

Com

bined-

SVR

075lowastlowast

1187

077lowastlowast

1150

071lowastlowast

1263

066lowastlowast

1354

071lowastlowast

1263

082lowastlowast

1031

PCAp

rincipal

compo

nent

analysis

RPDE

recurrence

period

density

entrop

ySR

MR

speech-to-reverberationmod

ulationratio

HNR

harm

onics-to-noise

ratio

CPP

cepstralpeak

prom

inenceM

odA

mod

ulationareaG

FCCG

ammaton

efrequencycepstral

coeffi

cientsL

CQAL

owCom

plexity

QualityAssessm

entandSV

Rsupp

ortv

ectorregressio

n

)e Scientific World Journal 7

of the envelope and the ratio between energies in thelow band and the energies in the high bands of theenvelope indicate the quality of the waveform )ecorrelation between HNR and the subjective scoreswas 055 for the training database and 074 for thetest database while the correlation values betweenthe subjective measurements and the CPP scoreswere 029 and 053 for the training and the testdatasets respectivelymdashall of which were statisticallysignificant RPDE was the highest among the single-feature objective metrics to have values of correlationwith the subjective scores that reached statisticallysignificant 080 and 075 values for the training andthe test datasets respectively It is noted that there isstill a difference between the correlation values of thetraining and the test datasets of RPDE which meansthat this metric has deficiency in predicting thequality for new (ie unseen) sustained vowel sam-ples Figure 3 shows the scatter plot for the RPDEand HNR measures against the subjective scores

(2) Objective Metrics with Multiple Features )e full setmetrics are the metrics that contained multiplefeatures and the whole set of features are trainedwithout any feature reduction Using machinelearning algorithms on the GFCC features did notyield a reliable metric to estimate the Parkinsonianvoice quality while applying SVR on the LCQAfeatures resulted in a LCQA-SVR metric that has a066 correlation value with the subjective scores

(3) Reduced Multiple Feature Objective Metrics Apply-ing PCA and the Monte Carlo feature selection andreduction method to the GFCC and LCQA featuresled to a reduction of the number of dimensions of theGFCCmetric from 60 to 3 features only It also led toreducing the number of LCQA features from 40 to16 It is noted that the metrics resulting from thefeature reduction method led to higher performancethan the PCA method )is enhanced the perfor-mance of most of the metrics Applying LR to theLCQA metric led to obtaining an objective metricthat has a statistically significant 075 correlationvalue with the subjective scores

(4) A Composite Objective Voice Quality Estimator Acomposite metric was derived by augmenting theHNR and CPP features with LCQA features andapplying SVR and LR to estimate the vowel qualityscores )e combined metric which included 42features resulted in predicted quality scores that hada 077 correlation value with the subjective qualityscores )is is noteworthy in that it is higher than allthe other multiple feature metrics

Afterwards the PCA method was applied to the featuresbefore training the model to estimate the vowel quality )enumber of dimensions was reduced to 23 features whichexplained 95 of the data variance Finally applying thefeature reduction method had greater improvement of theperformance more than using PCA Applying LR to the

reduced combined feature set resulted in a model that es-timated the quality of the vowels with a statistically sig-nificant 080 correlation value with the subjective scores Totest the statistical significance between the correlation valuesof the obtained scores from the combined metric comparedto the second-highest objective metric which is the reducedLCQA metric Steigerrsquos Z-test [28] was applied to measurethe statistical difference It was found that there is a statisticalenhancement when using the combined reduced metricinstead of the reduced LCQA metric Figure 4 shows thescatter plot of the subjective voice quality scores on the x-axis against the predicted quality scores by the compositemetric on the y-axis for the training test and full data

It is noted that the composite metric had the highestcorrelation for training and test datasets followed by theRPDE feature In order to further confirm this finding boththese metrics were trained repeatedly with different trainingdatasets (different samples selected randomly from thewhole dataset each time) and then applied to the corre-sponding test dataset )en the average correlation valuesfor the training and the test datasets were calculated andthey were found to be 075 for the RPDEmethod and 080 forthe combined reducedmethod)e difference between thesetwo correlation coefficients was statistically significant in-dicating the composite measure provided a better overallperformance that was robust to random partitioning of thedatabase

4 Discussion and Conclusion

In this paper the quality of the sustained vowels producedby patients with PD was predicted through objectiveacoustical analyses A previously collected vowel databaseby Cushnie-Sparrow et al [8] was utilized for this purpose)is database consisted of sustained vowel samples from 51PD patients before and after taking the Levodopa medi-cation along with vowel samples from 11 healthy controlsubjects which resulted in the formation of a database of113 vowels A panel of 3 listeners rated the perceivedquality of these vowel recordings [8] which were used intraining the objective models and assessing their accuracyVowel quality prediction features included GFCC LCQAHNR smoothed CPP and RPDE Machine learning al-gorithms SVR and LR were applied to these multidimen-sional features to estimate the quality objectively Some ofthe features mentioned above were blended to form acomposite objective metric that displayed significantlybetter performance than the other metrics Moreover PCAand feature reduction were applied to reduce the number ofinput features to machine learning algorithms to reduce theoverfitting and enhance the performance of the objectivemetrics

Initial investigation focused on individual features andparameters extracted from the vowel samples Although anumber of these individual features exhibited statisticallysignificant correlation values with auditory-perceptual rat-ings only a subset of the measures exhibited correlationcoefficients greater than 05 Of the commonly reported

8 )e Scientific World Journal

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 4: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

23 Feature Mapping While RPDE HNR and CPP aresingle numbers that represent the predicted speech qualitythe GFCC and LCQA are multidimensional feature vectorsMapping algorithms aim to generate a function that as-similates the multidimensional feature vectors to match thesubjective scores To express this mathematically we have[23]

y f(θX) + b (1)

where θ represents the parameters and functions associatedwith the feature mapper X is the feature matrix that has sizem times n m is the number of training samples n is the size ofthe feature vector y are the subjective scores correspondingto the training samples and b is the prediction errorCommonly used feature mappers include multivariate linearregression (LR) and support vector regression (SVR) [24]

24 Principal Component Analysis (PCA) PCA is used toreduce the dimensionality of the input features of the ma-chine learning algorithm and enhance the interpretation ofthe features [25] )is dimensionality reduction or featurereduction has to be done in a way that maintains the in-formation contained in the input features [25] PCA utilizesthe eigenvalues and the eigenvectors to come up with newfeatures that have smaller dimensionality but maximizes thevariance of the dataset [25] More details about PCA can befound in [25]

25 Feature Selection and Reduction A higher dimension-ality of the feature vector may cause overfitting In suchsituations extracted numbers of features for each metricmust be reduced before applying the machine learning al-gorithm to avoid overfitting To accomplish this goal thecorrelation between each single feature and the subjectivescores was obtained and then the features were rearrangedaccording to their correlation values from the highest to thelowest Subsequently a Monte Carlo algorithm was appliedto extract the maximum number of features that minimizedthe cost function for both the training and the test datasets)is algorithm took the rearranged featuresrsquo matrix and thesubjective scores vector as two inputs [26] At this point thedata was split into a training dataset and test dataset wherethe training dataset contained 80 of the full data while thetest dataset contained the remaining 20 of the dataset )ealgorithm applied linear regression to a subset of the datasetsto find which subset achieved the minimum mean squareerror (MSE) with the subjective scores

3 Results

31 Subjective Results Intrarater reliability of the perceptualjudgment of voice quality was assessed using the intraclasscorrelation coefficient (ICC) [8] Each rater was assessedusing average agreement in a two-way mixed model )eaverage ICC across all raters [8] which is considered to bemoderate intrarater reliability Interrater reliability acrossthe 3 subjective estimators was assessed using average

consistency in a two-way random model average [8] whichcan be interpreted as good interrater reliability

Paired sample t-tests showed that there were no statis-tically significant differences between PD vowel qualityratings on and off Levodopa In other words when the PDpatient cohort was considered as a whole the PDmedicationdid not have any influence on their vowel quality An in-teresting finding does emerge however when PD group isdivided into two groups those with poor perceived voicequality and those with good perceived voice quality in theoff-medication condition )ere was a significant im-provement in perceived vowel quality for the poor-qualitygroup with the administration of medication )e differ-ences among the two groups in terms of the off-medicationvoice quality and the improvement after medication areshown in Figure 1 It can be seen that patients who have lowsustained vowel quality ratings before taking Levodopa havea high improvement in voice quality after taking themedication On the other hand people who have high voicequality ratings before taking the medication have a statis-tically insignificant change in voice quality )ese resultshighlight the need for either subjective or objective as-sessment of PD voice quality in order to predict the ef-fectiveness of Levodopa medication on voice quality

32 Objective Results Figure 2 displays the sample spec-trograms associated with sustained vowel samples collectedfrom 2 subjects in the database Figure 3(b) displays thespectrogram of the normal control subject with a highsubjective quality score )is record had a relative jitter of029 a relative shimmer of 299 dB a CPP value of 11 dBand a HNR value of 228 dB Figure 32 displays the spec-trogram of a subject impaired with PD)is subject had beenoff Levodopa medication and had a low sustained vowelquality rating )is record of the PD subject has a relativejitter of 102 a relative shimmer of 1213 dB a CPP value of155 dB and a HNR value of 1423 dB

Detailed analyses revealed that jitter and shimmer hadpoor correlation values with the subjective scores of thequality of sustained vowel records and therefore they werenot considered to be reliable objective metrics of the qualityof Parkinsonian sustained vowels

Table 1 shows (a) the correlation values between theobjective scores and the subjective perceived quality ratingsusing different metrics and (b) standard deviation of pre-diction error (SDPE) given by SDPE 1113954σs

1 minus ρ2

1113968 where 1113954σs

is the standard deviation of the subjective speech qualityscores and ρ is the correlation coefficient between the trueand predicted quality scores [27] )e statistical significanceof the ρ parameter was computed as well and correlationcoefficients with significance values plt 005 and plt 001 aredenoted by lowast and lowastlowast respectively It must be noted here thatwhile high correlation coefficients between objective andsubjective measures are desirable a big difference betweenthe correlation coefficients for training and test datasets is anindication of overfitting

(1) Individual Feature Performance )e first five ob-jective metrics SRMR ModA CPP HNR and

4 )e Scientific World Journal

0 05 1 15 2ndash025

ndash02

ndash015

ndash01

ndash005

0

005

01

015

02

Am

plitu

de

Time (secs)

(a)

ndash004

ndash003

ndash002

ndash001

0

001

002

003

004

005

Am

plitu

de

0 05 1 15 2Time (secs)

(b)

Figure 2 Continued

0

20

40

60

80

100

Off medicationOn medication

Poor Better

Aver

aged

qua

lity

scor

es

Voice quality level before medication

Figure 1 Averaged perceptual overall vowel quality scores off and onmedication for the two PD groups those whose premedication qualityscore was below a threshold (labeled as ldquopoorrdquo quality level) and those with a score above the threshold (labeled as ldquobetterrdquo quality level))ethreshold was set as the lower 95 confidence interval for the mean vowel quality rating of the control subjects

)e Scientific World Journal 5

RPDE are solo features Consequently LR was ap-plied to the training dataset which contained 80 ofthe whole database and then the trained linearmodel was applied to the test dataset to benchmarkthe objective metric performance Both SRMR andModA had low statistically insignificant correlationvalues with the quality of sustained vowels for the

training and the test datasets An explanation of thatis the envelope of the high-quality sustained voweldoes not have a lot of variations while the envelopeof the low-quality sustained vowel contains highvariations (see Figures 3(a) and 3(b) respectively))is contradicts the way SRMR and ModA measurethe quality of running speech in which the variations

0 20 40 60 8002

03

04

05

06

07

08

RPD

E

Subjective scores

(a)

0 20 40 60 80 1005

10

15

20

25

30

Subjective scores

HN

R (d

B)

(b)

Figure 3 Scatter plots of subjective scores against recurrence period density entropy (RPDE) and harmonics-to-noise ratio (HNR) for thewhole dataset )e x-axis on these plots is the averaged subjective score across the whole database while the y-axis represents the acousticmeasure )e correlation coefficient between the objective and subjective measures is presented in the text (a) RPDE (b) HNR

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(c)

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(d)

Figure 2 Waveforms and spectrograms of selected sustained vowels recordings from subjects in the database (a c) )e waveform and thespectrogram for the sustained vowel ldquo∖a∖rdquo collected from a normal control subject (b d) the waveform and the spectrogram for ldquo∖a∖rdquosustained vowel collected from a PD subject who has been off his medication (a) Control subject waveform (b) PD subject waveform(c) Control subject high quality (d) Parkinsonian subject poor quality

6 )e Scientific World Journal

Tabl

e1Correlatio

ncoeffi

cientsandstandard

deviationof

predictio

nerror(SD

PE)o

fobjectiv

emetricswith

subjectiv

escoresInd

ices

thatareb

oldedin

acolumnares

tatisticallysim

ilarin

theirperformance

buta

resig

nificantly

bette

rthan

others

inthat

column

Sign

ificanceof

correlationvalues

indicatedbylowastplt005

andlowastlowast

plt001

Metric

Fullset

PCA

Redu

ced

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SPDE

(training)

Correlatio

n(test)

SDPE

(test)

SRMR

028lowastlowast

1726

016

1774

mdashmdash

mdashmdash

mdashmdash

mdashmdash

Mod

A040lowastlowast

1648

033

1697

mdashmdash

mdashmdash

mdashmdash

mdashmdash

CPP

029lowastlowast

1720

053lowast

1524

mdashmdash

mdashmdash

mdashmdash

mdashmdash

HNR

055lowastlowast

1501

074lowastlowast

1209

mdashmdash

mdashmdash

mdashmdash

mdashmdash

RPDE

080lowastlowast

1079

075lowastlowast

1198

mdashmdash

mdashmdash

mdashmdash

mdashmdash

GFC

C-LR

086lowastlowast

915

minus025

1744

065lowastlowast

1363

042

1635

056lowastlowast

1486

055lowastlowast

1505

GFC

C-SVR

060lowastlowast

1435

006

1798

060lowastlowast

1435

030

1719

054lowastlowast

1510

052lowast

1539

LCQA-LR

082lowastlowast

1027

046lowast

1600

076lowastlowast

1166

055lowastlowast

1505

075lowastlowast

1187

075lowastlowast

1192

LCQA-SVR

066lowastlowast

1348

066lowastlowast

1354

073lowastlowast

1226

053lowast

1528

066lowastlowast

1348

066lowastlowast

1528

Com

bined-

LR087lowastlowast

885

051lowast

1550

073lowastlowast

1226

070lowastlowast

1287

081lowastlowast

1052

080lowastlowast

1081

Com

bined-

SVR

075lowastlowast

1187

077lowastlowast

1150

071lowastlowast

1263

066lowastlowast

1354

071lowastlowast

1263

082lowastlowast

1031

PCAp

rincipal

compo

nent

analysis

RPDE

recurrence

period

density

entrop

ySR

MR

speech-to-reverberationmod

ulationratio

HNR

harm

onics-to-noise

ratio

CPP

cepstralpeak

prom

inenceM

odA

mod

ulationareaG

FCCG

ammaton

efrequencycepstral

coeffi

cientsL

CQAL

owCom

plexity

QualityAssessm

entandSV

Rsupp

ortv

ectorregressio

n

)e Scientific World Journal 7

of the envelope and the ratio between energies in thelow band and the energies in the high bands of theenvelope indicate the quality of the waveform )ecorrelation between HNR and the subjective scoreswas 055 for the training database and 074 for thetest database while the correlation values betweenthe subjective measurements and the CPP scoreswere 029 and 053 for the training and the testdatasets respectivelymdashall of which were statisticallysignificant RPDE was the highest among the single-feature objective metrics to have values of correlationwith the subjective scores that reached statisticallysignificant 080 and 075 values for the training andthe test datasets respectively It is noted that there isstill a difference between the correlation values of thetraining and the test datasets of RPDE which meansthat this metric has deficiency in predicting thequality for new (ie unseen) sustained vowel sam-ples Figure 3 shows the scatter plot for the RPDEand HNR measures against the subjective scores

(2) Objective Metrics with Multiple Features )e full setmetrics are the metrics that contained multiplefeatures and the whole set of features are trainedwithout any feature reduction Using machinelearning algorithms on the GFCC features did notyield a reliable metric to estimate the Parkinsonianvoice quality while applying SVR on the LCQAfeatures resulted in a LCQA-SVR metric that has a066 correlation value with the subjective scores

(3) Reduced Multiple Feature Objective Metrics Apply-ing PCA and the Monte Carlo feature selection andreduction method to the GFCC and LCQA featuresled to a reduction of the number of dimensions of theGFCCmetric from 60 to 3 features only It also led toreducing the number of LCQA features from 40 to16 It is noted that the metrics resulting from thefeature reduction method led to higher performancethan the PCA method )is enhanced the perfor-mance of most of the metrics Applying LR to theLCQA metric led to obtaining an objective metricthat has a statistically significant 075 correlationvalue with the subjective scores

(4) A Composite Objective Voice Quality Estimator Acomposite metric was derived by augmenting theHNR and CPP features with LCQA features andapplying SVR and LR to estimate the vowel qualityscores )e combined metric which included 42features resulted in predicted quality scores that hada 077 correlation value with the subjective qualityscores )is is noteworthy in that it is higher than allthe other multiple feature metrics

Afterwards the PCA method was applied to the featuresbefore training the model to estimate the vowel quality )enumber of dimensions was reduced to 23 features whichexplained 95 of the data variance Finally applying thefeature reduction method had greater improvement of theperformance more than using PCA Applying LR to the

reduced combined feature set resulted in a model that es-timated the quality of the vowels with a statistically sig-nificant 080 correlation value with the subjective scores Totest the statistical significance between the correlation valuesof the obtained scores from the combined metric comparedto the second-highest objective metric which is the reducedLCQA metric Steigerrsquos Z-test [28] was applied to measurethe statistical difference It was found that there is a statisticalenhancement when using the combined reduced metricinstead of the reduced LCQA metric Figure 4 shows thescatter plot of the subjective voice quality scores on the x-axis against the predicted quality scores by the compositemetric on the y-axis for the training test and full data

It is noted that the composite metric had the highestcorrelation for training and test datasets followed by theRPDE feature In order to further confirm this finding boththese metrics were trained repeatedly with different trainingdatasets (different samples selected randomly from thewhole dataset each time) and then applied to the corre-sponding test dataset )en the average correlation valuesfor the training and the test datasets were calculated andthey were found to be 075 for the RPDEmethod and 080 forthe combined reducedmethod)e difference between thesetwo correlation coefficients was statistically significant in-dicating the composite measure provided a better overallperformance that was robust to random partitioning of thedatabase

4 Discussion and Conclusion

In this paper the quality of the sustained vowels producedby patients with PD was predicted through objectiveacoustical analyses A previously collected vowel databaseby Cushnie-Sparrow et al [8] was utilized for this purpose)is database consisted of sustained vowel samples from 51PD patients before and after taking the Levodopa medi-cation along with vowel samples from 11 healthy controlsubjects which resulted in the formation of a database of113 vowels A panel of 3 listeners rated the perceivedquality of these vowel recordings [8] which were used intraining the objective models and assessing their accuracyVowel quality prediction features included GFCC LCQAHNR smoothed CPP and RPDE Machine learning al-gorithms SVR and LR were applied to these multidimen-sional features to estimate the quality objectively Some ofthe features mentioned above were blended to form acomposite objective metric that displayed significantlybetter performance than the other metrics Moreover PCAand feature reduction were applied to reduce the number ofinput features to machine learning algorithms to reduce theoverfitting and enhance the performance of the objectivemetrics

Initial investigation focused on individual features andparameters extracted from the vowel samples Although anumber of these individual features exhibited statisticallysignificant correlation values with auditory-perceptual rat-ings only a subset of the measures exhibited correlationcoefficients greater than 05 Of the commonly reported

8 )e Scientific World Journal

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 5: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

0 05 1 15 2ndash025

ndash02

ndash015

ndash01

ndash005

0

005

01

015

02

Am

plitu

de

Time (secs)

(a)

ndash004

ndash003

ndash002

ndash001

0

001

002

003

004

005

Am

plitu

de

0 05 1 15 2Time (secs)

(b)

Figure 2 Continued

0

20

40

60

80

100

Off medicationOn medication

Poor Better

Aver

aged

qua

lity

scor

es

Voice quality level before medication

Figure 1 Averaged perceptual overall vowel quality scores off and onmedication for the two PD groups those whose premedication qualityscore was below a threshold (labeled as ldquopoorrdquo quality level) and those with a score above the threshold (labeled as ldquobetterrdquo quality level))ethreshold was set as the lower 95 confidence interval for the mean vowel quality rating of the control subjects

)e Scientific World Journal 5

RPDE are solo features Consequently LR was ap-plied to the training dataset which contained 80 ofthe whole database and then the trained linearmodel was applied to the test dataset to benchmarkthe objective metric performance Both SRMR andModA had low statistically insignificant correlationvalues with the quality of sustained vowels for the

training and the test datasets An explanation of thatis the envelope of the high-quality sustained voweldoes not have a lot of variations while the envelopeof the low-quality sustained vowel contains highvariations (see Figures 3(a) and 3(b) respectively))is contradicts the way SRMR and ModA measurethe quality of running speech in which the variations

0 20 40 60 8002

03

04

05

06

07

08

RPD

E

Subjective scores

(a)

0 20 40 60 80 1005

10

15

20

25

30

Subjective scores

HN

R (d

B)

(b)

Figure 3 Scatter plots of subjective scores against recurrence period density entropy (RPDE) and harmonics-to-noise ratio (HNR) for thewhole dataset )e x-axis on these plots is the averaged subjective score across the whole database while the y-axis represents the acousticmeasure )e correlation coefficient between the objective and subjective measures is presented in the text (a) RPDE (b) HNR

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(c)

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(d)

Figure 2 Waveforms and spectrograms of selected sustained vowels recordings from subjects in the database (a c) )e waveform and thespectrogram for the sustained vowel ldquo∖a∖rdquo collected from a normal control subject (b d) the waveform and the spectrogram for ldquo∖a∖rdquosustained vowel collected from a PD subject who has been off his medication (a) Control subject waveform (b) PD subject waveform(c) Control subject high quality (d) Parkinsonian subject poor quality

6 )e Scientific World Journal

Tabl

e1Correlatio

ncoeffi

cientsandstandard

deviationof

predictio

nerror(SD

PE)o

fobjectiv

emetricswith

subjectiv

escoresInd

ices

thatareb

oldedin

acolumnares

tatisticallysim

ilarin

theirperformance

buta

resig

nificantly

bette

rthan

others

inthat

column

Sign

ificanceof

correlationvalues

indicatedbylowastplt005

andlowastlowast

plt001

Metric

Fullset

PCA

Redu

ced

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SPDE

(training)

Correlatio

n(test)

SDPE

(test)

SRMR

028lowastlowast

1726

016

1774

mdashmdash

mdashmdash

mdashmdash

mdashmdash

Mod

A040lowastlowast

1648

033

1697

mdashmdash

mdashmdash

mdashmdash

mdashmdash

CPP

029lowastlowast

1720

053lowast

1524

mdashmdash

mdashmdash

mdashmdash

mdashmdash

HNR

055lowastlowast

1501

074lowastlowast

1209

mdashmdash

mdashmdash

mdashmdash

mdashmdash

RPDE

080lowastlowast

1079

075lowastlowast

1198

mdashmdash

mdashmdash

mdashmdash

mdashmdash

GFC

C-LR

086lowastlowast

915

minus025

1744

065lowastlowast

1363

042

1635

056lowastlowast

1486

055lowastlowast

1505

GFC

C-SVR

060lowastlowast

1435

006

1798

060lowastlowast

1435

030

1719

054lowastlowast

1510

052lowast

1539

LCQA-LR

082lowastlowast

1027

046lowast

1600

076lowastlowast

1166

055lowastlowast

1505

075lowastlowast

1187

075lowastlowast

1192

LCQA-SVR

066lowastlowast

1348

066lowastlowast

1354

073lowastlowast

1226

053lowast

1528

066lowastlowast

1348

066lowastlowast

1528

Com

bined-

LR087lowastlowast

885

051lowast

1550

073lowastlowast

1226

070lowastlowast

1287

081lowastlowast

1052

080lowastlowast

1081

Com

bined-

SVR

075lowastlowast

1187

077lowastlowast

1150

071lowastlowast

1263

066lowastlowast

1354

071lowastlowast

1263

082lowastlowast

1031

PCAp

rincipal

compo

nent

analysis

RPDE

recurrence

period

density

entrop

ySR

MR

speech-to-reverberationmod

ulationratio

HNR

harm

onics-to-noise

ratio

CPP

cepstralpeak

prom

inenceM

odA

mod

ulationareaG

FCCG

ammaton

efrequencycepstral

coeffi

cientsL

CQAL

owCom

plexity

QualityAssessm

entandSV

Rsupp

ortv

ectorregressio

n

)e Scientific World Journal 7

of the envelope and the ratio between energies in thelow band and the energies in the high bands of theenvelope indicate the quality of the waveform )ecorrelation between HNR and the subjective scoreswas 055 for the training database and 074 for thetest database while the correlation values betweenthe subjective measurements and the CPP scoreswere 029 and 053 for the training and the testdatasets respectivelymdashall of which were statisticallysignificant RPDE was the highest among the single-feature objective metrics to have values of correlationwith the subjective scores that reached statisticallysignificant 080 and 075 values for the training andthe test datasets respectively It is noted that there isstill a difference between the correlation values of thetraining and the test datasets of RPDE which meansthat this metric has deficiency in predicting thequality for new (ie unseen) sustained vowel sam-ples Figure 3 shows the scatter plot for the RPDEand HNR measures against the subjective scores

(2) Objective Metrics with Multiple Features )e full setmetrics are the metrics that contained multiplefeatures and the whole set of features are trainedwithout any feature reduction Using machinelearning algorithms on the GFCC features did notyield a reliable metric to estimate the Parkinsonianvoice quality while applying SVR on the LCQAfeatures resulted in a LCQA-SVR metric that has a066 correlation value with the subjective scores

(3) Reduced Multiple Feature Objective Metrics Apply-ing PCA and the Monte Carlo feature selection andreduction method to the GFCC and LCQA featuresled to a reduction of the number of dimensions of theGFCCmetric from 60 to 3 features only It also led toreducing the number of LCQA features from 40 to16 It is noted that the metrics resulting from thefeature reduction method led to higher performancethan the PCA method )is enhanced the perfor-mance of most of the metrics Applying LR to theLCQA metric led to obtaining an objective metricthat has a statistically significant 075 correlationvalue with the subjective scores

(4) A Composite Objective Voice Quality Estimator Acomposite metric was derived by augmenting theHNR and CPP features with LCQA features andapplying SVR and LR to estimate the vowel qualityscores )e combined metric which included 42features resulted in predicted quality scores that hada 077 correlation value with the subjective qualityscores )is is noteworthy in that it is higher than allthe other multiple feature metrics

Afterwards the PCA method was applied to the featuresbefore training the model to estimate the vowel quality )enumber of dimensions was reduced to 23 features whichexplained 95 of the data variance Finally applying thefeature reduction method had greater improvement of theperformance more than using PCA Applying LR to the

reduced combined feature set resulted in a model that es-timated the quality of the vowels with a statistically sig-nificant 080 correlation value with the subjective scores Totest the statistical significance between the correlation valuesof the obtained scores from the combined metric comparedto the second-highest objective metric which is the reducedLCQA metric Steigerrsquos Z-test [28] was applied to measurethe statistical difference It was found that there is a statisticalenhancement when using the combined reduced metricinstead of the reduced LCQA metric Figure 4 shows thescatter plot of the subjective voice quality scores on the x-axis against the predicted quality scores by the compositemetric on the y-axis for the training test and full data

It is noted that the composite metric had the highestcorrelation for training and test datasets followed by theRPDE feature In order to further confirm this finding boththese metrics were trained repeatedly with different trainingdatasets (different samples selected randomly from thewhole dataset each time) and then applied to the corre-sponding test dataset )en the average correlation valuesfor the training and the test datasets were calculated andthey were found to be 075 for the RPDEmethod and 080 forthe combined reducedmethod)e difference between thesetwo correlation coefficients was statistically significant in-dicating the composite measure provided a better overallperformance that was robust to random partitioning of thedatabase

4 Discussion and Conclusion

In this paper the quality of the sustained vowels producedby patients with PD was predicted through objectiveacoustical analyses A previously collected vowel databaseby Cushnie-Sparrow et al [8] was utilized for this purpose)is database consisted of sustained vowel samples from 51PD patients before and after taking the Levodopa medi-cation along with vowel samples from 11 healthy controlsubjects which resulted in the formation of a database of113 vowels A panel of 3 listeners rated the perceivedquality of these vowel recordings [8] which were used intraining the objective models and assessing their accuracyVowel quality prediction features included GFCC LCQAHNR smoothed CPP and RPDE Machine learning al-gorithms SVR and LR were applied to these multidimen-sional features to estimate the quality objectively Some ofthe features mentioned above were blended to form acomposite objective metric that displayed significantlybetter performance than the other metrics Moreover PCAand feature reduction were applied to reduce the number ofinput features to machine learning algorithms to reduce theoverfitting and enhance the performance of the objectivemetrics

Initial investigation focused on individual features andparameters extracted from the vowel samples Although anumber of these individual features exhibited statisticallysignificant correlation values with auditory-perceptual rat-ings only a subset of the measures exhibited correlationcoefficients greater than 05 Of the commonly reported

8 )e Scientific World Journal

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 6: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

RPDE are solo features Consequently LR was ap-plied to the training dataset which contained 80 ofthe whole database and then the trained linearmodel was applied to the test dataset to benchmarkthe objective metric performance Both SRMR andModA had low statistically insignificant correlationvalues with the quality of sustained vowels for the

training and the test datasets An explanation of thatis the envelope of the high-quality sustained voweldoes not have a lot of variations while the envelopeof the low-quality sustained vowel contains highvariations (see Figures 3(a) and 3(b) respectively))is contradicts the way SRMR and ModA measurethe quality of running speech in which the variations

0 20 40 60 8002

03

04

05

06

07

08

RPD

E

Subjective scores

(a)

0 20 40 60 80 1005

10

15

20

25

30

Subjective scores

HN

R (d

B)

(b)

Figure 3 Scatter plots of subjective scores against recurrence period density entropy (RPDE) and harmonics-to-noise ratio (HNR) for thewhole dataset )e x-axis on these plots is the averaged subjective score across the whole database while the y-axis represents the acousticmeasure )e correlation coefficient between the objective and subjective measures is presented in the text (a) RPDE (b) HNR

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(c)

0

1

2

3

4

5

6

7

8

ndash120

ndash110

ndash100

ndash90

ndash80

ndash70

ndash60

ndash50

05 1 15Time (secs)

Freq

uenc

y (k

Hz)

Pow

erfr

eque

ncy

(dB

Hz)

(d)

Figure 2 Waveforms and spectrograms of selected sustained vowels recordings from subjects in the database (a c) )e waveform and thespectrogram for the sustained vowel ldquo∖a∖rdquo collected from a normal control subject (b d) the waveform and the spectrogram for ldquo∖a∖rdquosustained vowel collected from a PD subject who has been off his medication (a) Control subject waveform (b) PD subject waveform(c) Control subject high quality (d) Parkinsonian subject poor quality

6 )e Scientific World Journal

Tabl

e1Correlatio

ncoeffi

cientsandstandard

deviationof

predictio

nerror(SD

PE)o

fobjectiv

emetricswith

subjectiv

escoresInd

ices

thatareb

oldedin

acolumnares

tatisticallysim

ilarin

theirperformance

buta

resig

nificantly

bette

rthan

others

inthat

column

Sign

ificanceof

correlationvalues

indicatedbylowastplt005

andlowastlowast

plt001

Metric

Fullset

PCA

Redu

ced

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SPDE

(training)

Correlatio

n(test)

SDPE

(test)

SRMR

028lowastlowast

1726

016

1774

mdashmdash

mdashmdash

mdashmdash

mdashmdash

Mod

A040lowastlowast

1648

033

1697

mdashmdash

mdashmdash

mdashmdash

mdashmdash

CPP

029lowastlowast

1720

053lowast

1524

mdashmdash

mdashmdash

mdashmdash

mdashmdash

HNR

055lowastlowast

1501

074lowastlowast

1209

mdashmdash

mdashmdash

mdashmdash

mdashmdash

RPDE

080lowastlowast

1079

075lowastlowast

1198

mdashmdash

mdashmdash

mdashmdash

mdashmdash

GFC

C-LR

086lowastlowast

915

minus025

1744

065lowastlowast

1363

042

1635

056lowastlowast

1486

055lowastlowast

1505

GFC

C-SVR

060lowastlowast

1435

006

1798

060lowastlowast

1435

030

1719

054lowastlowast

1510

052lowast

1539

LCQA-LR

082lowastlowast

1027

046lowast

1600

076lowastlowast

1166

055lowastlowast

1505

075lowastlowast

1187

075lowastlowast

1192

LCQA-SVR

066lowastlowast

1348

066lowastlowast

1354

073lowastlowast

1226

053lowast

1528

066lowastlowast

1348

066lowastlowast

1528

Com

bined-

LR087lowastlowast

885

051lowast

1550

073lowastlowast

1226

070lowastlowast

1287

081lowastlowast

1052

080lowastlowast

1081

Com

bined-

SVR

075lowastlowast

1187

077lowastlowast

1150

071lowastlowast

1263

066lowastlowast

1354

071lowastlowast

1263

082lowastlowast

1031

PCAp

rincipal

compo

nent

analysis

RPDE

recurrence

period

density

entrop

ySR

MR

speech-to-reverberationmod

ulationratio

HNR

harm

onics-to-noise

ratio

CPP

cepstralpeak

prom

inenceM

odA

mod

ulationareaG

FCCG

ammaton

efrequencycepstral

coeffi

cientsL

CQAL

owCom

plexity

QualityAssessm

entandSV

Rsupp

ortv

ectorregressio

n

)e Scientific World Journal 7

of the envelope and the ratio between energies in thelow band and the energies in the high bands of theenvelope indicate the quality of the waveform )ecorrelation between HNR and the subjective scoreswas 055 for the training database and 074 for thetest database while the correlation values betweenthe subjective measurements and the CPP scoreswere 029 and 053 for the training and the testdatasets respectivelymdashall of which were statisticallysignificant RPDE was the highest among the single-feature objective metrics to have values of correlationwith the subjective scores that reached statisticallysignificant 080 and 075 values for the training andthe test datasets respectively It is noted that there isstill a difference between the correlation values of thetraining and the test datasets of RPDE which meansthat this metric has deficiency in predicting thequality for new (ie unseen) sustained vowel sam-ples Figure 3 shows the scatter plot for the RPDEand HNR measures against the subjective scores

(2) Objective Metrics with Multiple Features )e full setmetrics are the metrics that contained multiplefeatures and the whole set of features are trainedwithout any feature reduction Using machinelearning algorithms on the GFCC features did notyield a reliable metric to estimate the Parkinsonianvoice quality while applying SVR on the LCQAfeatures resulted in a LCQA-SVR metric that has a066 correlation value with the subjective scores

(3) Reduced Multiple Feature Objective Metrics Apply-ing PCA and the Monte Carlo feature selection andreduction method to the GFCC and LCQA featuresled to a reduction of the number of dimensions of theGFCCmetric from 60 to 3 features only It also led toreducing the number of LCQA features from 40 to16 It is noted that the metrics resulting from thefeature reduction method led to higher performancethan the PCA method )is enhanced the perfor-mance of most of the metrics Applying LR to theLCQA metric led to obtaining an objective metricthat has a statistically significant 075 correlationvalue with the subjective scores

(4) A Composite Objective Voice Quality Estimator Acomposite metric was derived by augmenting theHNR and CPP features with LCQA features andapplying SVR and LR to estimate the vowel qualityscores )e combined metric which included 42features resulted in predicted quality scores that hada 077 correlation value with the subjective qualityscores )is is noteworthy in that it is higher than allthe other multiple feature metrics

Afterwards the PCA method was applied to the featuresbefore training the model to estimate the vowel quality )enumber of dimensions was reduced to 23 features whichexplained 95 of the data variance Finally applying thefeature reduction method had greater improvement of theperformance more than using PCA Applying LR to the

reduced combined feature set resulted in a model that es-timated the quality of the vowels with a statistically sig-nificant 080 correlation value with the subjective scores Totest the statistical significance between the correlation valuesof the obtained scores from the combined metric comparedto the second-highest objective metric which is the reducedLCQA metric Steigerrsquos Z-test [28] was applied to measurethe statistical difference It was found that there is a statisticalenhancement when using the combined reduced metricinstead of the reduced LCQA metric Figure 4 shows thescatter plot of the subjective voice quality scores on the x-axis against the predicted quality scores by the compositemetric on the y-axis for the training test and full data

It is noted that the composite metric had the highestcorrelation for training and test datasets followed by theRPDE feature In order to further confirm this finding boththese metrics were trained repeatedly with different trainingdatasets (different samples selected randomly from thewhole dataset each time) and then applied to the corre-sponding test dataset )en the average correlation valuesfor the training and the test datasets were calculated andthey were found to be 075 for the RPDEmethod and 080 forthe combined reducedmethod)e difference between thesetwo correlation coefficients was statistically significant in-dicating the composite measure provided a better overallperformance that was robust to random partitioning of thedatabase

4 Discussion and Conclusion

In this paper the quality of the sustained vowels producedby patients with PD was predicted through objectiveacoustical analyses A previously collected vowel databaseby Cushnie-Sparrow et al [8] was utilized for this purpose)is database consisted of sustained vowel samples from 51PD patients before and after taking the Levodopa medi-cation along with vowel samples from 11 healthy controlsubjects which resulted in the formation of a database of113 vowels A panel of 3 listeners rated the perceivedquality of these vowel recordings [8] which were used intraining the objective models and assessing their accuracyVowel quality prediction features included GFCC LCQAHNR smoothed CPP and RPDE Machine learning al-gorithms SVR and LR were applied to these multidimen-sional features to estimate the quality objectively Some ofthe features mentioned above were blended to form acomposite objective metric that displayed significantlybetter performance than the other metrics Moreover PCAand feature reduction were applied to reduce the number ofinput features to machine learning algorithms to reduce theoverfitting and enhance the performance of the objectivemetrics

Initial investigation focused on individual features andparameters extracted from the vowel samples Although anumber of these individual features exhibited statisticallysignificant correlation values with auditory-perceptual rat-ings only a subset of the measures exhibited correlationcoefficients greater than 05 Of the commonly reported

8 )e Scientific World Journal

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 7: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

Tabl

e1Correlatio

ncoeffi

cientsandstandard

deviationof

predictio

nerror(SD

PE)o

fobjectiv

emetricswith

subjectiv

escoresInd

ices

thatareb

oldedin

acolumnares

tatisticallysim

ilarin

theirperformance

buta

resig

nificantly

bette

rthan

others

inthat

column

Sign

ificanceof

correlationvalues

indicatedbylowastplt005

andlowastlowast

plt001

Metric

Fullset

PCA

Redu

ced

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SDPE

(training)

Correlatio

n(test)

SDPE

(test)

Correlatio

n(training)

SPDE

(training)

Correlatio

n(test)

SDPE

(test)

SRMR

028lowastlowast

1726

016

1774

mdashmdash

mdashmdash

mdashmdash

mdashmdash

Mod

A040lowastlowast

1648

033

1697

mdashmdash

mdashmdash

mdashmdash

mdashmdash

CPP

029lowastlowast

1720

053lowast

1524

mdashmdash

mdashmdash

mdashmdash

mdashmdash

HNR

055lowastlowast

1501

074lowastlowast

1209

mdashmdash

mdashmdash

mdashmdash

mdashmdash

RPDE

080lowastlowast

1079

075lowastlowast

1198

mdashmdash

mdashmdash

mdashmdash

mdashmdash

GFC

C-LR

086lowastlowast

915

minus025

1744

065lowastlowast

1363

042

1635

056lowastlowast

1486

055lowastlowast

1505

GFC

C-SVR

060lowastlowast

1435

006

1798

060lowastlowast

1435

030

1719

054lowastlowast

1510

052lowast

1539

LCQA-LR

082lowastlowast

1027

046lowast

1600

076lowastlowast

1166

055lowastlowast

1505

075lowastlowast

1187

075lowastlowast

1192

LCQA-SVR

066lowastlowast

1348

066lowastlowast

1354

073lowastlowast

1226

053lowast

1528

066lowastlowast

1348

066lowastlowast

1528

Com

bined-

LR087lowastlowast

885

051lowast

1550

073lowastlowast

1226

070lowastlowast

1287

081lowastlowast

1052

080lowastlowast

1081

Com

bined-

SVR

075lowastlowast

1187

077lowastlowast

1150

071lowastlowast

1263

066lowastlowast

1354

071lowastlowast

1263

082lowastlowast

1031

PCAp

rincipal

compo

nent

analysis

RPDE

recurrence

period

density

entrop

ySR

MR

speech-to-reverberationmod

ulationratio

HNR

harm

onics-to-noise

ratio

CPP

cepstralpeak

prom

inenceM

odA

mod

ulationareaG

FCCG

ammaton

efrequencycepstral

coeffi

cientsL

CQAL

owCom

plexity

QualityAssessm

entandSV

Rsupp

ortv

ectorregressio

n

)e Scientific World Journal 7

of the envelope and the ratio between energies in thelow band and the energies in the high bands of theenvelope indicate the quality of the waveform )ecorrelation between HNR and the subjective scoreswas 055 for the training database and 074 for thetest database while the correlation values betweenthe subjective measurements and the CPP scoreswere 029 and 053 for the training and the testdatasets respectivelymdashall of which were statisticallysignificant RPDE was the highest among the single-feature objective metrics to have values of correlationwith the subjective scores that reached statisticallysignificant 080 and 075 values for the training andthe test datasets respectively It is noted that there isstill a difference between the correlation values of thetraining and the test datasets of RPDE which meansthat this metric has deficiency in predicting thequality for new (ie unseen) sustained vowel sam-ples Figure 3 shows the scatter plot for the RPDEand HNR measures against the subjective scores

(2) Objective Metrics with Multiple Features )e full setmetrics are the metrics that contained multiplefeatures and the whole set of features are trainedwithout any feature reduction Using machinelearning algorithms on the GFCC features did notyield a reliable metric to estimate the Parkinsonianvoice quality while applying SVR on the LCQAfeatures resulted in a LCQA-SVR metric that has a066 correlation value with the subjective scores

(3) Reduced Multiple Feature Objective Metrics Apply-ing PCA and the Monte Carlo feature selection andreduction method to the GFCC and LCQA featuresled to a reduction of the number of dimensions of theGFCCmetric from 60 to 3 features only It also led toreducing the number of LCQA features from 40 to16 It is noted that the metrics resulting from thefeature reduction method led to higher performancethan the PCA method )is enhanced the perfor-mance of most of the metrics Applying LR to theLCQA metric led to obtaining an objective metricthat has a statistically significant 075 correlationvalue with the subjective scores

(4) A Composite Objective Voice Quality Estimator Acomposite metric was derived by augmenting theHNR and CPP features with LCQA features andapplying SVR and LR to estimate the vowel qualityscores )e combined metric which included 42features resulted in predicted quality scores that hada 077 correlation value with the subjective qualityscores )is is noteworthy in that it is higher than allthe other multiple feature metrics

Afterwards the PCA method was applied to the featuresbefore training the model to estimate the vowel quality )enumber of dimensions was reduced to 23 features whichexplained 95 of the data variance Finally applying thefeature reduction method had greater improvement of theperformance more than using PCA Applying LR to the

reduced combined feature set resulted in a model that es-timated the quality of the vowels with a statistically sig-nificant 080 correlation value with the subjective scores Totest the statistical significance between the correlation valuesof the obtained scores from the combined metric comparedto the second-highest objective metric which is the reducedLCQA metric Steigerrsquos Z-test [28] was applied to measurethe statistical difference It was found that there is a statisticalenhancement when using the combined reduced metricinstead of the reduced LCQA metric Figure 4 shows thescatter plot of the subjective voice quality scores on the x-axis against the predicted quality scores by the compositemetric on the y-axis for the training test and full data

It is noted that the composite metric had the highestcorrelation for training and test datasets followed by theRPDE feature In order to further confirm this finding boththese metrics were trained repeatedly with different trainingdatasets (different samples selected randomly from thewhole dataset each time) and then applied to the corre-sponding test dataset )en the average correlation valuesfor the training and the test datasets were calculated andthey were found to be 075 for the RPDEmethod and 080 forthe combined reducedmethod)e difference between thesetwo correlation coefficients was statistically significant in-dicating the composite measure provided a better overallperformance that was robust to random partitioning of thedatabase

4 Discussion and Conclusion

In this paper the quality of the sustained vowels producedby patients with PD was predicted through objectiveacoustical analyses A previously collected vowel databaseby Cushnie-Sparrow et al [8] was utilized for this purpose)is database consisted of sustained vowel samples from 51PD patients before and after taking the Levodopa medi-cation along with vowel samples from 11 healthy controlsubjects which resulted in the formation of a database of113 vowels A panel of 3 listeners rated the perceivedquality of these vowel recordings [8] which were used intraining the objective models and assessing their accuracyVowel quality prediction features included GFCC LCQAHNR smoothed CPP and RPDE Machine learning al-gorithms SVR and LR were applied to these multidimen-sional features to estimate the quality objectively Some ofthe features mentioned above were blended to form acomposite objective metric that displayed significantlybetter performance than the other metrics Moreover PCAand feature reduction were applied to reduce the number ofinput features to machine learning algorithms to reduce theoverfitting and enhance the performance of the objectivemetrics

Initial investigation focused on individual features andparameters extracted from the vowel samples Although anumber of these individual features exhibited statisticallysignificant correlation values with auditory-perceptual rat-ings only a subset of the measures exhibited correlationcoefficients greater than 05 Of the commonly reported

8 )e Scientific World Journal

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 8: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

of the envelope and the ratio between energies in thelow band and the energies in the high bands of theenvelope indicate the quality of the waveform )ecorrelation between HNR and the subjective scoreswas 055 for the training database and 074 for thetest database while the correlation values betweenthe subjective measurements and the CPP scoreswere 029 and 053 for the training and the testdatasets respectivelymdashall of which were statisticallysignificant RPDE was the highest among the single-feature objective metrics to have values of correlationwith the subjective scores that reached statisticallysignificant 080 and 075 values for the training andthe test datasets respectively It is noted that there isstill a difference between the correlation values of thetraining and the test datasets of RPDE which meansthat this metric has deficiency in predicting thequality for new (ie unseen) sustained vowel sam-ples Figure 3 shows the scatter plot for the RPDEand HNR measures against the subjective scores

(2) Objective Metrics with Multiple Features )e full setmetrics are the metrics that contained multiplefeatures and the whole set of features are trainedwithout any feature reduction Using machinelearning algorithms on the GFCC features did notyield a reliable metric to estimate the Parkinsonianvoice quality while applying SVR on the LCQAfeatures resulted in a LCQA-SVR metric that has a066 correlation value with the subjective scores

(3) Reduced Multiple Feature Objective Metrics Apply-ing PCA and the Monte Carlo feature selection andreduction method to the GFCC and LCQA featuresled to a reduction of the number of dimensions of theGFCCmetric from 60 to 3 features only It also led toreducing the number of LCQA features from 40 to16 It is noted that the metrics resulting from thefeature reduction method led to higher performancethan the PCA method )is enhanced the perfor-mance of most of the metrics Applying LR to theLCQA metric led to obtaining an objective metricthat has a statistically significant 075 correlationvalue with the subjective scores

(4) A Composite Objective Voice Quality Estimator Acomposite metric was derived by augmenting theHNR and CPP features with LCQA features andapplying SVR and LR to estimate the vowel qualityscores )e combined metric which included 42features resulted in predicted quality scores that hada 077 correlation value with the subjective qualityscores )is is noteworthy in that it is higher than allthe other multiple feature metrics

Afterwards the PCA method was applied to the featuresbefore training the model to estimate the vowel quality )enumber of dimensions was reduced to 23 features whichexplained 95 of the data variance Finally applying thefeature reduction method had greater improvement of theperformance more than using PCA Applying LR to the

reduced combined feature set resulted in a model that es-timated the quality of the vowels with a statistically sig-nificant 080 correlation value with the subjective scores Totest the statistical significance between the correlation valuesof the obtained scores from the combined metric comparedto the second-highest objective metric which is the reducedLCQA metric Steigerrsquos Z-test [28] was applied to measurethe statistical difference It was found that there is a statisticalenhancement when using the combined reduced metricinstead of the reduced LCQA metric Figure 4 shows thescatter plot of the subjective voice quality scores on the x-axis against the predicted quality scores by the compositemetric on the y-axis for the training test and full data

It is noted that the composite metric had the highestcorrelation for training and test datasets followed by theRPDE feature In order to further confirm this finding boththese metrics were trained repeatedly with different trainingdatasets (different samples selected randomly from thewhole dataset each time) and then applied to the corre-sponding test dataset )en the average correlation valuesfor the training and the test datasets were calculated andthey were found to be 075 for the RPDEmethod and 080 forthe combined reducedmethod)e difference between thesetwo correlation coefficients was statistically significant in-dicating the composite measure provided a better overallperformance that was robust to random partitioning of thedatabase

4 Discussion and Conclusion

In this paper the quality of the sustained vowels producedby patients with PD was predicted through objectiveacoustical analyses A previously collected vowel databaseby Cushnie-Sparrow et al [8] was utilized for this purpose)is database consisted of sustained vowel samples from 51PD patients before and after taking the Levodopa medi-cation along with vowel samples from 11 healthy controlsubjects which resulted in the formation of a database of113 vowels A panel of 3 listeners rated the perceivedquality of these vowel recordings [8] which were used intraining the objective models and assessing their accuracyVowel quality prediction features included GFCC LCQAHNR smoothed CPP and RPDE Machine learning al-gorithms SVR and LR were applied to these multidimen-sional features to estimate the quality objectively Some ofthe features mentioned above were blended to form acomposite objective metric that displayed significantlybetter performance than the other metrics Moreover PCAand feature reduction were applied to reduce the number ofinput features to machine learning algorithms to reduce theoverfitting and enhance the performance of the objectivemetrics

Initial investigation focused on individual features andparameters extracted from the vowel samples Although anumber of these individual features exhibited statisticallysignificant correlation values with auditory-perceptual rat-ings only a subset of the measures exhibited correlationcoefficients greater than 05 Of the commonly reported

8 )e Scientific World Journal

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 9: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

vowel acoustic measures HNR performed the best withPearson correlation coefficients of 055 and 074 for thetraining and test partitions respectively )e correlationcoefficients exhibited by HNR and CPP [8] but are lowerthan those reported by Jannetts and Lowit [12] It isworthwhile to note that Jannetts and Lowit [12] reportedSpearmanrsquos rank-ordered correlation between the percep-tual ratings and acoustic measures unlike Pearsonrsquos cor-relation coefficient reported here which can perhaps explainthe discrepancy Furthermore Jannetts and Lowit [12]employed the GRBAS scale and the auditory-perceptualratings were provided by an experienced clinician It isplausible that these methodological differences also con-tributed to the differences

Among the individual measures the RPDE parameterproduced the best performance )e correlation coefficientsof 08 and 075 with training and test partitions were sig-nificantly better than those reported by other individualmeasures RPDE has been previously employed for dis-criminating between normal and PD voices [14 20] and ourresults demonstrate that it is suitable for predicting theperceived quality of PD vowel samples

While GFCC was used in other studies to measure thequality of Parkinsonian speech and had a good performance[15 26] this was not the case for estimating the perceivedquality for Parkinsonian sustained vowels )e best per-formance for GFCC objective metric after feature reductionresulted in a 055 correlation coefficient between the

0 20 40 60 80 1000

20

40

60

80

100

Obj

ectiv

e sco

res

Subjective scores

(a)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(b)

0

20

40

60

80

100

Obj

ectiv

e sco

res

0 20 40 60 80 100Subjective scores

(c)

Figure 4 Subjective scores versus objective scores using the reduced combined linear regression (LR) metric (a) shows the scatter plot forthe training data (b) shows the scatter plot for the test data and (c) depicts the scatter plot for the whole database

)e Scientific World Journal 9

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 10: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

subjective and the objective scores For the nonreduced fullset category applying SVR to the combination of the 40LCQA feature HNR and smoothed CPP was the best ob-jective metric of this category with a correlation value of 077for the test dataset )e difference between the training andthe test dataset was the minimum which meant that theeffect of overfitting is minimum Applying PCA to thefeatures led to the enhancement of most of the objectivemetrics It is noted that using LR and SVR on the PCAreduced combined metric yields to statistically similar re-sults Applying the feature reductionmethod to the objectivefeatures yielded a great enhancement in the performance ofthe objective metrics Applying LR and SVR to the reducedcombined metric yielded statistically similar results How-ever the metric with linear regression had a smaller dif-ference between the training and the test datasets whichmeans it is less prone to overfitting As a result the reducedcomposite metric with linear regression is considered to bethe best objective metric for estimating the quality of theParkinsonian sustained vowels

In summary a subset of acoustic measures includingHNR LCQA and RPDE exhibited a good correlation withauditory-perceptual voice assessments of the overall qualityof sustained vowels produced by a group of Parkinsonrsquospatients )e application of a regression model (LR andSVR) incorporating a subset of these acoustic featuresresulted in a statistically improved prediction of the per-ceived quality of the Parkinsonian vowels )e subjectiveratings used for benchmarking the objective models wereobtained from sustained vowels produced by PD patientsboth on and off Levodopa medication As such the clinicalimplications of the current study include the following (a)the derived model may serve as a surrogate for the subjectiveassessment of the effect of the Levodopa medication on thevoice quality of Parkinsonian subjects Since the evidenceshows that Levodopa improves the voice quality of Par-kinsonian patients only when their premedication voicequality is poor the derived model can potentially play aclinically relevant role in predicting the effectiveness ofLevodopamedication (b)More generally the derivedmodelcan potentially be applied for clinical assessment of theperceived quality of Parkinsonian vowels particularly formonitoring vowel quality over the course of any therapeuticintervention

Before closing a few limitations of our study must beacknowledged)e auditory-perceptual ratings used to trainand benchmark models were garnered from clinical grad-uate students with little experience Follow-up researchfocusing on model performance with auditory-perceptualratings from experienced clinicians is necessary While thederived model is promising for objective instrumental as-sessment of Parkinsonian vowel quality future research iswarranted to test its robustness and generalization capa-bilities and to further improve its performance One of theaspects that need to be addressed is the limited size of thecollected dataset used in this investigation )e size of thedataset needs to be increased to ensure more reliability andgeneralization capability of the proposed metrics Anotherarea for future research is to develop and evaluate more

advanced and complicated machine learning algorithmssuch as deep learning Applying deep learning emphasizesthe need for a larger dataset that needs to be collected topresent a more reliable and precise metric to estimate thesustained vowelsrsquo quality )e effect of gender on studiedacoustic variables is also of future research interest espe-cially on establishing the effectiveness of the derived modelin predicting male versus female PD patient vowel qualityFinally expanding the findings of this study to estimate thequality of continuous speech (as opposed to sustained vowel)will be of broad research interest

Data Availability

)e data used to support the findings of this study are re-stricted by the ethics board atWestern University in order toprotect patient privacy

Conflicts of Interest

)e authors declare that they have no conflicts of interest

References

[1] D Hirtz D J )urman K Gwinn-Hardy M MohamedA R Chaudhuri and R Zalutsky ldquoHow common are theldquocommonrdquo neurologic disordersrdquo Neurology vol 68 no 5pp 326ndash337 2007

[2] M Toft and Z K Wszolek ldquoOverview of the genetics ofParkinsonismrdquo in Parkinsonrsquos Disease pp 117ndash129 CRCPress Boca Raton FL USA 2012

[3] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective speech quality assessment of amplificationdevices for patients with Parkinsonrsquos diseaserdquo IEEE Trans-actions on Neural Systems and Rehabilitation Engineering2019

[4] R Pahwa and K E Lyons Handbook of Parkinsonrsquos DiseaseCRC Press Taylor amp Francis Group Boca Raton FL USA 5thedition 2013

[5] S Sapir L Ramig and C Fox ldquoSpeech therapy in thetreatment of Parkinsonrsquos diseaserdquo in Parkinsonrsquos Diseasepp 974ndash987 CRC Press Boca Raton FL USA 2012

[6] J L Goudreau and E Ahlskog ldquoSymptomatic treatment ofParkinsonrsquos disease levodopardquo in Parkinsonrsquos Diseasepp 847ndash863 CRC Press Boca Raton FL USA 2012

[7] L O Ramig C Fox and S Sapir ldquoSpeech treatment forParkinsonrsquos diseaserdquo Expert Review of Neurotherapeuticsvol 8 no 2 pp 297ndash309 2008

[8] D Cushnie-Sparrow S Adams A AbeyesekeraM Pieterman G Gilmore and M Jog ldquoVoice quality severityand responsiveness to levodopa in Parkinsonrsquos diseaserdquoJournal of Communication Disorders vol 76 pp 1ndash10 2018

[9] T H Falk V Parsa J F Santos et al ldquoObjective quality andintelligibility prediction for users of assistive listening devicesadvantages and limitations of existing toolsrdquo IEEE SignalProcessing Magazine vol 32 no 2 pp 114ndash124

[10] M Farrus J Hernando and P Ejarque ldquoJitter and shimmermeasurements for speaker recognitionrdquo in Proceedings of theEighth Annual Conference of the international Speech Com-munication Association Antwerp Belgium August 2007

[11] Y D Heman-Ackah D D Michael M M Baroody et alldquoCepstral peak prominence a more reliable measure of

10 )e Scientific World Journal

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11

Page 11: ImprovedEstimationofParkinsonianVowelQualitythrough ... · (ModA),served as themodulation-based features [9]. e envelope of the waveform was extracted and filtered to a numberoffilters[16,17].eratiosoftheenergyinthelow

dysphoniardquo Annals of Otology Rhinology amp Laryngologyvol 112 no 4 pp 324ndash333 2003

[12] S Jannetts and A Lowit ldquoCepstral analysis of hypokinetic andataxic voices correlations with perceptual and other acousticmeasuresrdquo Journal of Voice vol 28 no 6 pp 673ndash680 2014

[13] A Tsanas M A Little P E McSharry J Spielman andL O Ramig ldquoNovel speech signal processing algorithms forhigh-accuracy classification of Parkinsonrsquos diseaserdquo IEEETransactions on Biomedical Engineering vol 59 no 5pp 1264ndash1271 2012

[14] S Arora L Baghai-Ravary and A Tsanas ldquoDeveloping a largescale population screening tool for the assessment of Par-kinsonrsquos disease using telephone-quality voicerdquo(e Journal ofthe Acoustical Society of America vol 145 no 5 pp 2871ndash2884 2019

[15] A Gaballah V Parsa M Andreetta and S Adams ldquoObjectiveand subjective assessment of amplified parkinsonian speechqualityrdquo in Proceedings of the 2018 40th Annual InternationalConference of the IEEE Engineering in Medicine and BiologySociety (EMBC) pp 2084ndash2087 IEEE New Paltz NY USAOctober 2018

[16] T H Falk C Zheng and W-Y Chan ldquoA non-intrusivequality and intelligibility measure of reverberant and der-everberated speechrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 18 no 7 pp 1766ndash1774

[17] F Chen O Hazrati and P C Loizou ldquoPredicting the in-telligibility of reverberant speech for cochlear implant lis-teners with a non-intrusive intelligibility measurerdquoBiomedical Signal Processing and Control vol 8 no 3pp 311ndash314 2013

[18] V Grancharov D Zhao J Lindblom and W Kleijn ldquoLow-complexity nonintrusive speech quality assessmentrdquo IEEETransactions on Audio Speech and Language Processingvol 14 no 6 pp 1948ndash1956

[19] H Salehi and V Parsa ldquoOn nonintrusive speech qualityestimation for hearing aidsrdquo in Proceedings of the 2015 IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA) pp 1ndash5 IEEE New Paltz NY USAOctober 2015

[20] M A Little P E McSharry E J Hunter J Spielman andL O Ramig ldquoSuitability of dysphonia measurements fortelemonitoring of Parkinsonrsquos diseaserdquo IEEE Transactions onBiomedical Engineering vol 56 no 4 pp 1015ndash1022 2008

[21] M A Little P E McSharry S J Roberts D A Costello andI M Moroz ldquoExploiting nonlinear recurrence and fractalscaling properties for voice disorder detectionrdquo BiomedicalEngineering Online vol 6 no 1 p 23 2007

[22] P Boersma and D Weenink ldquoPRAAT doing phonetics bycomputer (version 51 05) (computer program)rdquo May 2009

[23] T Hastie R Tibshirani and J H Friedman ldquo)e elements ofstatistical learning data mining inference and predictionrdquoSpringer Series in Statistics Springer New York NY USA2nd edition 2009

[24] V N Vapnik ldquo)e nature of statistical learning theoryrdquoStatistics for Engineering and Information Science SpringerNew York NY USA 2nd edition 2000

[25] I T Jolliffe and J Cadima ldquoPrincipal component analysis areview and recent developmentsrdquo Philosophical Transactionsof the Royal Society A Mathematical Physical and EngineeringSciences vol 374 no 2065 Article ID 20150202 2016

[26] A Gaballah V Parsa M Andreetta and S Adams ldquoAs-sessment of amplified parkinsonian speech quality using deeplearningrdquo in Proceedings of the 2018 IEEE Canadian

Conference on Electrical amp Computer Engineering (CCECE)pp 1ndash4 IEEE Quebec Canada May 2018

[27] Y Hu and P C Loizou ldquoEvaluation of objective qualitymeasures for speech enhancementrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 1pp 229ndash238 2008

[28] J H Steiger ldquoTests for comparing elements of a correlationmatrixrdquo Psychological Bulletin vol 87 no 2 pp 245ndash2511980

)e Scientific World Journal 11