Download - Report VUV for Shifted Mar2011

8/6/2019 Report VUV for Shifted Mar2011

1/28

UTD-REP-01 Page 1

Technical Report UTD-REP-01

Evaluation of Voiced/Unvoiced Detection Algorithms

for Frequency-Shifted Speech

Jaewook Lee and Philipos Loizou

March 2011


2/28

UTD-REP-01 Page 2

I. Introduction

Voiced/unvoiced classification is important for speech coding, recognition and

enhancement. For that reason, various methods are developed for the robust classification. In this

report, four feature extraction methods, autocorrelation coefficient (AC), pre-emphasized energy

ratio (ER), zero crossing rate (ZCR) and high-to-full subband energy ration(SR), are used for

voiced/unvoiced speech classification[3]. And Otsus method is used to select threshold level from

histogram of AC, ER, ZCR and SR. For the final decision, short-time energy (STE) and its fixed

threshold level are used in silence detection[5,6]. Semiautomatic tool for voiced/unvoiced

detection is developed to obtain reliable voiced/unvoiced speech detection as a reference for test.

10 IEEE corpus sentences are selected and their frequency are shifted to the range of 600 ~ 1500

and range of -600 ~ -1500 respectively for test[7].


3/28

UTD-REP-01 Page 3

II. Algorithms for Voiced/Unvoiced Detection

A. Equation for Four Voiced/Unvoiced Detection Algorithms

Autocorrelation Coefficient (AC):

()()

()(1)

Pre-Emphasized Energy Rate (ER):

()()

()(2)

Zero Crossing Rate (ZCR):

(()( ) ) (3)

Where () is the indicator function which is 1 if the argument A is true and 0 otherwise.

Low-to-Full Subband Energy Ratio (SR):

()

()(4)

Where () is low-pass filtered speech at 3kHz.


4/28

UTD-REP-01 Page 4

B. Equation for Automatic Threshold Level Selection Algorithm

Otsus method (OTSU):

The optimum global threshold level

can be obtained by the value offor which

() is maximum.

()

() (5-1)

Where the between-class variance, for k=1,2,,N

()

(()())

()(())(5-2)

Where the global intensity mean,

() (5-3)

Where the cumulative means, () for k=1,2,,N

() () (5-4)

Where the cumulative sums, () for k=1,2,,N

() () (5-5)

Where the normalized histogram of input signal is p(i) for i=1,2,,N.

Histogram for Normalized AC and Threshold Level Using Otsus Method:

Figure 1. Histogram for (a) normalized AC, (b) normalized ER, (c) normalized ZCR and (d) normalized SR,

and their threshold level which is selected by Otsus method.

0 0.5 10

5

10

15

20

25

30

0.69

count

(a)

0 0.5 10

5

10

15

20

25

30

0.46

(b)

0 0.5 10

5

10

15

20

25

30

0.47

(c)

0 0.5 10

5

10

15

20

25

30

0.6

(d)


5/28

UTD-REP-01 Page 5

Voiced/Unvoiced Detection Using 4 Methods with Automatically Selected Threshold Level:

Figure 2. (a) Normalized AC with threshold level (0.68). (b) Normalized ER with threshold level (0.46). (c)

Normalized ZCR with threshold level (0.47). (d) Normalized SR with threshold level (0.60). (e)

Voiced/unvoiced detection using AC and its threshold level. (f) Voiced/unvoiced detection from AC.

0 0.5 1 1.5 2-0.5

0

0.5

data:

(a)

0 0.5 1 1.5 20

0.5

1

normalizedAC

(b)

0 0.5 1 1.5 20

0.5

1

normalizedER

(c)

0 0.5 1 1.5 20

0.5

1

normalizedZCR

(d)

0 0.5 1 1.5 20

0.5

1

normalizedSR

(e)

0 0.5 1 1.5 20

0.5

1

VUVfromAC:

(f)time (sec)


6/28

UTD-REP-01 Page 6

C. Decision Making

Short-Time Energy (STE) :

() (6)

STE with Fixed Threshold Level for Silence Detection (0.08 for unshifted, upshifted):

Figure 3. (a) Waveform of sample sentence. (b) Short-time Energy with fixed threshold level (0.08). (c)

Silence detection using STE with its threshold level.

Final Decision for Voiced/Unvoiced Speech Detection:

Figure 4. (a) Waveform of sample sentence. (b) Voiced/unvoiced detection using AC. (c) Silence detection

using STE. (d) Final decision for voiced/unvoiced detection of sample sentence.

0 0.5 1 1.5 2-0.5

0

0.5

data:

(a)

0 0.5 1 1.5 20

1

2

3

STE:

(b)

0 0.5 1 1.5 20

0.5

1

silencefromS

TE:

(c)time (sec)

0 0.5 1 1.5 2-0.5

0

0.5

data:

(a)

0 0.5 1 1.5 20

0.5

1

VUVfromA

C:

(b)

0 0.5 1 1.5 20

0.5

1

silencefromS

TE:

(c)

0 0.5 1 1.5 20

0.5

1

finaldecisionfor

VUV:

(d)time sec


7/28

UTD-REP-01 Page 7

II. Materials and Experimental Methods

A. 10 IEEE Sample Sentences and Reference Voiced/Unvoiced Detection

Table 1. 10 IEEE Sample Sentences for Test.

# sentence sex Len. (s) Fs (kHz)

1 The birch canoe slid on the smooth planks. M 2.8 25

2 He knew the skill of the great young actress. M 3.5 25

3 Her purse was full of useless trash. M 2.2 25

4 Read verse out loud for pleasure. M 2.1 25

5 Wipe the grease off his dirty face. M 2.2 25

6 He wrote down a long list of items. F 2.9 25

7 The drip of the rain made a pleasant sound. F 2.7 25

8 Smoke poured out of every crack. F 2.5 25

9 Hats are worn to tea and not to dinner. F 2.9 25

10 The clothes dried on a thin wooden rack. F 2.9 25

10 IEEE Sample Sentences and Voiced/Unvoiced Detection as a Reference:

Figure 5. Waveform of 10 IEEE sample sentences and their voiced/unvoiced detection as a reference which

are detected manually using spectrogram.

1 2 3 4 5 6 7

x 104

0

0.51

1 2 3 4 5 6 7 8

x 104

0

0.52

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

x 104

0

0.53

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.54

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

x 104

0

0.55

1 2 3 4 5 6 7

x 104

0

0.56

1 2 3 4 5 6

x 104

0

0.57

1 2 3 4 5 6

x 104

0

0.58

1 2 3 4 5 6 7

x 104

0

0.59

1 2 3 4 5 6 7

0

0.510


8/28

UTD-REP-01 Page 8

B. Upshifted and Downshifted Speech

Spectrogram for Unshifted and Shifted Speech:

Figure 6. Spectrogram of sample sentence: (a) unshifted speech, (b) upshifted speech by 800Hz, (c) upshifted

speech by 1200Hz, (d) downshifted speech by -800Hz and (e) downshifted speech by -1200Hz.

(a)

freq(Hz)

0 0.5 1 1.5 20

5000

10000

(b)

freq(Hz)

0 0.5 1 1.5 20

5000

10000

(c)

freq(Hz)

0 0.5 1 1.5 20

5000

10000

(d)

freq

(Hz)

0 0.5 1 1.5 20

5000

10000

(e)

freq(Hz)

time (sec)

0 0.5 1 1.5 20

5000

10000


9/28

UTD-REP-01 Page 9

C. Tool for Voiced/Unvoiced Detection As a Reference

Initial voiced and unvoiced detection Using ER:

Figure 7. Voiced/unvoiced detection using ER with Spectrogram.

Manual Correction of Voiced/Unvoiced Classification as a Reference using spectrogram by clicking

mouse button on the wrong detected point:

Figure 8. Corrected voiced/unvoiced detection manually using spectrogram


10/28

UTD-REP-01 Page 10

D. Performance Measurement

Table 2. Definition of Symbols in Error Calculation

HIT0 :Hit when unvoiced segment is correctly detected as an unvoiced. (unvoiced->unvoiced)

FALSE0 :False alarm when unvoiced segment is to be detected as a voiced. (unvoiced->voiced)

HIT1 :Hit when voiced segment is correctly detected as a voiced. (voiced->voiced)

FALSE1 :False alarm when voiced segment is to be detected as an unvoiced. (voiced->unvoiced)

Hit Rate:

(7-1)

False Alarm Rate:

(7-2)

Error Rate:

(7-3)


11/28

UTD-REP-01 Page 11

IV. Experimental Result

A. Voiced/Unvoiced Detection for Unshifted Speech

Voiced/Unvoiced Detection for Unshifted Male Sentence Using 4 Methods:

Figure 9. (a) Reference voiced/unvoiced detection for male sentence (Her purse was full of useless trash).

Voiced/unvoiced detection: (b) AC, (c)ER, (d) ZCR and (e) SR.

Voiced/Unvoiced Detection for Unshifted Female Sentence Using 4 Methods:

Figure 10. (a) Reference voiced/unvoiced detection for female sentence (Hats are worn to tea and not to

dinner.). Voiced/unvoiced detection: (b) AC, (c)ER, (d) ZCR and (e) SR.

0 0.5 1 1.5 20

0.5

1

Ref.:

(a)

0 0.5 1 1.5 20

0.5

1

AC:

(b)

0 0.5 1 1.5 2

0

0.5

1

ER:

(c)

0 0.5 1 1.5 20

0.5

1

ZCR:

(d)

0 0.5 1 1.5 20

0.5

1

SR:

(e)time sec)

0 0.5 1 1.5 2 2.50

0.5

1

Ref.:

(a)

0 0.5 1 1.5 2 2.50

0.5

1

AC:

(b)

0 0.5 1 1.5 2 2.50

0.5

1

ER:

(c)

0 0.5 1 1.5 2 2.50

0.51

ZCR

:

(d)

0 0.5 1 1.5 2 2.50

0.5

1

SR:

(e)time (sec)


12/28

UTD-REP-01 Page 12

Comparison of Detection Result with Reference Detection:

Table 3. Hit, False Alarm and Error Rate of Voiced/Unvoiced Detection for Unshifted Speech

Hit Rate (%) False Alarm Rate (%) Error Rate (%)

AC

ER

ZCR

SR

92.8222

93.1485

93.3116

92.8222

1.7833

1.7833

1.7833

1.7833

4.2474

4.0984

4.0238

4.2474

Error Rate of Each Method:

Figure 11. Hit, false alarm and error rate for unshifted speech.


13/28

UTD-REP-01 Page 13

B. Voiced/Unvoiced Detection for Frequency Upshifted Sentences

Voiced/Unvoiced Detection for Frequency Upshifted Sentences Using AC:

Figure 12. Voiced/unvoiced detection for upshifted speech using AC in the frequency range from 600 to

1500Hz.

0 0.5 1 1.5 20

0.51

Ref.

0 0.5 1 1.5 20

0.51

600Hz

0 0.5 1 1.5 20

0.51

700Hz

0 0.5 1 1.5 20

0.51

800Hz

0 0.5 1 1.5 20

0.5

1

900

Hz

0 0.5 1 1.5 20

0.51

1000Hz

0 0.5 1 1.5 20

0.51

1100Hz

0 0.5 1 1.5 20

0.51

1200Hz

0 0.5 1 1.5 20

0.51

1300Hz

0 0.5 1 1.5 20

0.51

1400Hz

0 0.5 1 1.5 20

0.51

1500Hz

time (sec)


14/28

UTD-REP-01 Page 14

Voiced/Unvoiced Detection for Frequency Upshifted Sentences Using ER:

Figure 13. Voiced/unvoiced detection for upshifted speech using ER in the frequency range from 600 to 1500

Hz.

0 0.5 1 1.5 20

0.51

Ref.

0 0.5 1 1.5 20

0.5

1

600

Hz

0 0.5 1 1.5 20

0.51

700Hz

0 0.5 1 1.5 20

0.51

800Hz

0 0.5 1 1.5 20

0.51

900Hz

0 0.5 1 1.5 20

0.5

1

1000

Hz

0 0.5 1 1.5 20

0.51

1100Hz

0 0.5 1 1.5 20

0.51

1200Hz

0 0.5 1 1.5 20

0.51

1300Hz

0 0.5 1 1.5 20

0.51

1400Hz

0 0.5 1 1.5 20

0.51

1500Hz

time (sec)


15/28

UTD-REP-01 Page 15

Voiced/Unvoiced Detection for Frequency Upshifted Sentences Using ZCR:

Figure 14. Voiced/unvoiced detection for upshifted speech using ZCR in the frequency range from 600 to

1500 Hz.

0 0.5 1 1.5 20

0.51

Ref.

0 0.5 1 1.5 20

0.5

1

600Hz

0 0.5 1 1.5 20

0.51

700Hz

0 0.5 1 1.5 20

0.51

800Hz

0 0.5 1 1.5 20

0.51

900Hz

0 0.5 1 1.5 20

0.511000H

z

0 0.5 1 1.5 20

0.51

1100Hz

0 0.5 1 1.5 20

0.51

1200Hz

0 0.5 1 1.5 20

0.51

1300Hz

0 0.5 1 1.5 20

0.51

1400Hz

0 0.5 1 1.5 20

0.51

1500Hz

time (sec)


16/28

UTD-REP-01 Page 16

Voiced/Unvoiced Detection for Frequency Upshifted Sentences Using SR:

Figure 15. Voiced/unvoiced detection for upshifted speech using SR in the frequency range from 600 to 1500

Hz.

0 0.5 1 1.5 20

0.51

Ref.

0 0.5 1 1.5 20

0.5

1

600Hz

0 0.5 1 1.5 20

0.51

700Hz

0 0.5 1 1.5 20

0.51

800Hz

0 0.5 1 1.5 20

0.51

900Hz

0 0.5 1 1.5 20

0.51

1000Hz

0 0.5 1 1.5 20

0.51

1100Hz

0 0.5 1 1.5 20

0.51

1200Hz

0 0.5 1 1.5 20

0.51

1300Hz

0 0.5 1 1.5 20

0.51

1400H

z

0 0.5 1 1.5 20

0.51

1500Hz

time (sec)


17/28

UTD-REP-01 Page 17


Table 4. Hit, False Alarm and Error Rate for Upshifted speech.

AC ER ZCR SR

Hit False Error Hit False Error Hit False Error Hit False Error

600

700

800

900

1000

1100

1200

1300

1400

1500

93.31

93.14

93.14

93.31

93.14

93.14

93.47

93.31

93.47

93.31

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

4.09

4.17

4.17

4.09

4.17

4.17

4.02

4.09

4.02

4.09

93.47

93.31

93.31

93.47

93.31

93.31

93.47

93.31

93.47

93.31

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

4.02

4.09

4.09

4.02

4.09

4.09

4.02

4.09

4.02

4.09

93.96

93.47

93.47

93.96

93.63

93.80

93.96

93.80

93.63

93.63

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

3.80

4.02

4.02

3.80

3.94

3.87

3.80

3.87

3.94

3.94

93.47

93.31

93.31

93.47

93.31

93.31

93.47

93.31

93.47

93.31

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

1.92

4.02

4.09

4.09

4.02

4.09

4.09

4.02

4.09

4.02

4.09

Error Rate for Each Upshifted Frequency Level:

Figure 16. Hit, false alarm and error rate for upshifted speech.

600 700 800 900 1000 1100 1200 1300 1400 150090

92

94

96

hitrate(%)

AC

ER

ZCR

SR

600 700 800 900 1000 1100 1200 1300 1400 15000

2

4

6

falsealarmr

ate(%)

600 700 800 900 1000 1100 1200 1300 1400 15000

2

4

6

errorrate(%)

frequency level (Hz)


18/28

UTD-REP-01 Page 18

C. Voiced/Unvoiced Detection for Frequency Downshifted Sentences

Voiced/Unvoiced Detection for Frequency Downshifted Sentences Using AC:

Figure 17. Voiced/unvoiced detection for downshifted speech using AC in the frequency range from -600 to-1500 Hz.

0 0.5 1 1.5 20

0.51

Ref.

0 0.5 1 1.5 20

0.51

-600Hz

0 0.5 1 1.5 20

0.51

-700Hz

0 0.5 1 1.5 20

0.51

-800Hz

0 0.5 1 1.5 20

0.51

-900Hz

0 0.5 1 1.5 20

0.51

-1000Hz

0 0.5 1 1.5 20

0.51

-1100Hz

0 0.5 1 1.5 2

00.5

1

-1200Hz

0 0.5 1 1.5 20

0.51

-1300Hz

0 0.5 1 1.5 20

0.51

-1400Hz

0 0.5 1 1.5 20

0.51

-1500Hz

time (sec)


19/28

UTD-REP-01 Page 19

Voiced/Unvoiced Detection for Frequency Downshifted Sentences Using ER:

Figure 18. Voiced/unvoiced detection for downshifted speech using ER in the frequency range from -600 to

-1500 Hz.

0 0.5 1 1.5 20

0.51

Ref.

0 0.5 1 1.5 20

0.5

1

-600

Hz

0 0.5 1 1.5 20

0.51

-700Hz

0 0.5 1 1.5 20

0.51

-800Hz

0 0.5 1 1.5 20

0.51

-900Hz

0 0.5 1 1.5 20

0.51

-1000

Hz

0 0.5 1 1.5 20

0.51

-1100Hz

0 0.5 1 1.5 20

0.51

-1200Hz

0 0.5 1 1.5 20

0.51

-1300Hz

0 0.5 1 1.5 20

0.51

-1400H

z

0 0.5 1 1.5 20

0.51

-1500Hz

time (sec)


20/28

UTD-REP-01 Page 20

Voiced/Unvoiced Detection for Frequency Downshifted Sentences Using ZCR:

Figure 19. Voiced/unvoiced detection for downshifted speech using ZCR in the frequency range from -600 to

-1500 Hz.

0 0.5 1 1.5 20

0.51

Ref.

0 0.5 1 1.5 200.5

1

-60

0Hz

0 0.5 1 1.5 20

0.51

-700Hz

0 0.5 1 1.5 20

0.51

-800Hz

0 0.5 1 1.5 20

0.51

-900Hz

0 0.5 1 1.5 200.5

1

-10

00Hz

0 0.5 1 1.5 20

0.51

-1100Hz

0 0.5 1 1.5 20

0.51

-1200Hz

0 0.5 1 1.5 20

0.51

-1300Hz

0 0.5 1 1.5 20

0.51

-1400Hz

0 0.5 1 1.5 20

0.51

-1500Hz

time (sec)


21/28

UTD-REP-01 Page 21

Voiced/Unvoiced Detection for Frequency Downshifted Sentences Using SR:

Figure 20. Voiced/unvoiced detection for downshifted speech using SR in the frequency range from -600 to

-1500 Hz.

0 0.5 1 1.5 20

0.51

Ref.

0 0.5 1 1.5 20

0.5

1

-600Hz

0 0.5 1 1.5 20

0.51

-700Hz

0 0.5 1 1.5 20

0.51

-800Hz

0 0.5 1 1.5 20

0.51

-900Hz

0 0.5 1 1.5 20

0.5

1

-1000Hz

0 0.5 1 1.5 20

0.51

-1100Hz

0 0.5 1 1.5 20

0.51

-1200Hz

0 0.5 1 1.5 20

0.51

-1300Hz

0 0.5 1 1.5 20

0.51

-1400Hz

0 0.5 1 1.5 20

0.51

-1500Hz

time (sec)


22/28

UTD-REP-01 Page 22


Table 5. Hit, False Alarm and Error Rate for Downshifted Speech.

AC ER ZCR SR

Hit False Error Hit False Error Hit False Error Hit False Error

-600

-700

-800

-900

-1000

-1100

-1200

-1300

-1400

-1500

73.40

77.16

79.77

79.77

78.62

78.95

78.79

74.38

79.77

79.77

3.97

5.62

6.03

7.13

8.36

10.19

10.21

11.24

12.75

13.71

14.30

13.48

12.51

13.11

14.30

15.12

15.27

17.80

16.16

16.69

76.34

80.26

83.36

83.03

82.05

82.87

83.52

78.95

84.01

83.36

3.97

5.76

6.58

7.27

9.05

10.69

11.11

12.34

14.67

15.22

12.96

12.14

11.17

11.69

13.11

13.63

13.56

16.31

15.27

15.87

76.50

79.44

83.52

83.03

82.21

82.70

81.72

78.46

83.68

83.84

4.52

5.76

6.85

7.95

8.64

10.97

10.56

12.20

14.95

16.46

13.18

12.51

11.25

12.07

12.81

13.85

14.08

16.46

15.57

16.31

75.20

78.79

82.38

81.72

80.42

80.58

81.23

76.83

82.05

81.89

4.25

5.62

6.85

7.13

8.77

10.56

10.83

11.93

13.58

14.54

13.63

12.74

11.77

12.22

13.71

14.60

14.45

17.06

15.57

16.16

Error Rate on Each Downshifted Frequency Level:

Figure 21. Hit, false alarm and error rate for downshifted speech.


23/28

UTD-REP-01 Page 23

VI. Conclusions and Planned Activity

Four feature extraction algorithms, autocorrelation coefficient (AC), pre-emphasized

energy ratio (ER), zero crossing rate (ZCR) and high-to-full subband energy ratio (SR), are used

for voiced/unvoiced classification, and their threshold levels are automatically selected using

Otsus method. Short-time Energy is used as a silence detection method with fixed threshold

level to make final decision for voiced/unvoiced classification. 10 IEEE corpus sentences are

used for test, and their reference voiced/unvoiced detection are manually obtained using

spectrogram. In unshifted and upshifted speech, all four methods have error rate under 4.3% in

all frequency range from 600 Hz to 1500 Hz. In the frequency downshifted speech, all four

methods have error rate of 11% to 18% in the frequency range from -600Hz to -1500Hz. And

three activities to improve performance of voiced/unvoiced detection are proposed as a

planned activity below:

Multiple Number of Threshold Level:

Two or multiple threshold levels are detected. Lower level is used for voiced/unvoiced

detection, and upper level is used to make sure it is voiced or unvoiced. Each level will be

detected to use multiple times of Otsus method [5,6].

Reliable Detection for Weak Voiced Speech:

If there are unvoiced segment near voiced utterance, it is likely to be voiced which is

weak between the voiced and unvoiced utterance. STE will be used to detect voiced speech,

and AC,ER,ZCR and SR will be used to detect unvoiced speech. Then they will be combined

together to detect weak voiced speech[4].

New Approach to Decision Making:

Each voiced, unvoiced and silence speech are manually detected, and their statistical

information such as mean and variance is obtained from their histogram. Final decision will be

made by using the statistical information. This is Bayesian approach to voiced/unvoiced

detection[4].


24/28

UTD-REP-01 Page 24

References

[1] John G. Proakis, Digital Signal Processing, 4th, Pearson

[2] Philip C. Loizou, Speech Enhancement Theory and Practice, CRC

[3] A.M.Kondoz, Digital Speech: coding for low bit rate communication systems, Wiley

[4] L.R.Rabiner, R.W.Schafer, Theory and Applications of Digital Speech Processing, Prentice Hall

[5] R.C.Gonzalez, R.E.Woods, Digital Image Processing, Pearson

[6] Otsu, N.,A Threshold Selection Method from Gray-Level Histograms, IEEE Transactions on Systems, Man,and Cybernetics, Vol. 9, No. 1, 1979, pp. 62-66.

[7] IEEE Subcommittee (1969). IEEE Recommended Practice for Speech Quality Measurements.IEEE Trans.

Audio and Electroacoustics, AU-17(3), 225-246.

Matlab Code

Matlab Function for Autocorrelation Coefficient (AC) with Short-time Energy (STE):

function [ac,ste,n]=ac(data,win_size)

n=floor(length(data)/win_size);

data_fit=data(1:n*win_size);

data_2=data(1:n*win_size+1);

for i=1:n

data_win=data_fit(1+win_size*(i-1):win_size*i);

data_post=data_2(2+win_size*(i-1):1+win_size*i);

ac(i)=sum(data_post.*data_win)/sum(data_post.^2);

ste(i)=sum(data_win.^2);

end

end


25/28

UTD-REP-01 Page 25

Matlab Function for Pre-emphasize Energy Ratio (ER):

function [er,ste,n]=er(data,win_size)

n=floor(length(data)/win_size);



for i=1:n



er(i)=sum(abs(data_post-data_win))/sum(abs(data_post));

ste(i)=sum(data_win.^2);

end

end

Matlab Function for Zero Crossing Rate (ZCR):

function [zcr,ste,n]=zcr(data,win_size)n=floor(length(data)/win_size);



for i=1:n



[row,column]=find((data_win.*data_post)


26/28

UTD-REP-01 Page 26

Semiautomatic Tool for Voiced/Unvoiced Detection using Spectrogram:

filename='C:\Users\Owner\Desktop\P4\10sentences\sp03';

cd('C:\Users\Owner\Desktop\P4\10sentences');

[num,txt]=xlsread('10sentences.xlsx'); sentence=txt{3};

[data,fs]=wavread(filename);win=0.02*fs;

[er,ste,n]=er(data,win); er=er/max(er);

thres_sil=0.08;

thres_er=graythresh(er);

vuv_er=ones(1,n);

for i=1:n

if ste(i)thres_er vuv_er(i)=0; end

end

for i=1:n vuv(1+win*(i-1):win*i)=vuv_er(i); end

%%

subplot(2,1,1); area(vuv,'edgecolor','c','facecolor','c'); hold on;

subplot(2,1,1); plot(data(1:win*n)+0.4); ylim([0,0.9]); hold off;

title(sentence,'fontsize',12);

subplot(2,1,2); specgram(data(1:win*n));

for j=1:100

[x,y]=ginput(1);

for i=0:n

if (x>=i*win+1 && x


27/28

UTD-REP-01 Page 27

'C:\Users\Owner\Desktop\P4\10sentences\1500'};

cd('C:\Users\Owner\Desktop\P4\10sentences');

file=dir('*.wav'); file_ref=dir('*.mat');

filenames={file.name}'; filenames_ref={file_ref.name}';

%%

for k=1:10

cd(pathnames{k})

for i=1:10

[data,fs]=wavread(filenames{i});

win=0.02*fs;

[ac,ste,n(i)]=ac(data,win);

[er]=er(data,win);

[zcr]=zcr(data,win);

[sr]=sr(data,win,fs);

ac=ac/max(ac);

er=er/max(er);

zcr=zcr/max(zcr);

sr=sr/max(sr);

thres_sil=0.08;

thres_ac=graythresh(ac);

thres_er=graythresh(er);

thres_zcr=graythresh(zcr);

thres_sr=graythresh(sr);

for j=1:n(i)if (ste(j)


28/28

for l=1:4

if l==1 vuv_1=vuv_ac; end

if l==2 vuv_1=vuv_er; end

if l==3 vuv_1=vuv_zcr; end

if l==4 vuv_1=vuv_sr; end

for k=1:10

hit0=0; false0=0; hit1=0; false1=0;

for i=1:10

for j=1:n(i)

if (vuv(i,j)==0 && vuv_1(i,j,k)==0) hit0=hit0+1; end

if (vuv(i,j)==0 && vuv_1(i,j,k)==1) false0=false0+1; end

if (vuv(i,j)==1 && vuv_1(i,j,k)==1) hit1=hit1+1; end

if (vuv(i,j)==1 && vuv_1(i,j,k)==0) false1=false1+1; end

end

end

hit(l,k)=hit0/(hit0+false0)*100;

false(l,k)=false1/(hit1+false1)*100;

error(l,k)=(false0+false1)/sum(n)*100;

end

end