An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

32
YAO YAO UC BERKELEY [email protected] http://linguistics.berkeley.edu/~yaoyao JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

description

An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops. YAO YAO UC BERKELEY [email protected] http://linguistics.berkeley.edu/~yaoyao JULY 25, 2008. Overview. Background Data Methodology Algorithm Tuning the model Testing Results General Discussion. - PowerPoint PPT Presentation

Transcript of An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Page 1: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

YAO YAO UC BERKELEY

[email protected]://linguistics.berkeley.edu/~yaoyao

JULY 25, 2008

An Exemplar-based Approach to Automatic Burst Detection

in Voiceless Stops

Page 2: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Overview2

BackgroundDataMethodology

Algorithm Tuning the model Testing

ResultsGeneral Discussion

Page 3: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Background3

Purpose of the study To find the point of

burst in a word initial voiceless stop (i.e. [p], [t], [k])

Existing approach Detecting the point of maximal energy change (cf.

Niyogi and Ramesh, 1998; Liu, 1996)

close release vowel onset

Page 4: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Background4

Our approach Compare the spectrogram of the target token at each

point against that of fricatives and silence Assess how “fricative-like” and “silence-like” the

spectrogram is at each time point Find the point where “fricative-ness” suddenly rises

and “silence-ness” suddenly drops point of burst

Page 5: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Background5

Our approach (cont’d) What do we need?

Spectral features of a given time frame Spectral templates of fricatives and silence

Specific to speaker and the recording environment Measure and compare fricative-ness and silence-ness An algorithm to find the most likely point for release

Advantage Easy to implement No worries about change in the environment and

individual differences

Page 6: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Data6

Buckeye corpus (Pitt, M. et al. 2005)40 speakers

All residents of Columbus, Ohio Balanced in gender and age One-hour interview Transcribed at word and phone level 19 used in the current study

Target tokens Transcribed word-initial voiceless stops (e.g. [p], [t], [k])

Page 7: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: spectral measures7

Spectral vector 20ms Hamming window Mel scale 1 × 60 array

Spectral template Speaker-specific, phone-specific Ignore tokens shorter than average duration of that phone

of the speaker For the remaining tokens

Calculate a spectral vector for the middle 20ms window Average over the spectral vectors

Page 8: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: spectral template8

[a] of F01 [f] of F01 Silence of F01

Page 9: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: similarity scores9

Similarity between spectral vectors x and u

Dx,u =

Sx,u = e-0.005Dx,u

Comparing the given acoustic data against any spectral templates of that speaker Stepsize = 5ms

60)(

1||60

1 jjjj usdux

Page 10: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Similarity scores

Formulae:

Dx,t =

Sx,t = e-0.005Dx,t

Step size = 5ms

10

60)(

1||60

1 jjjj tsdtx

- [s] score

- <sil> score

Page 11: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: finding the release point11

Basic idea

Near the release point - Fricative similarity score rises - Silence similarity score drops

Closure BurstFricative-ness Low HighSilence-ness High Low

close release vowel onset

Q1: Which fricative to use?

Q2: Which period of rise or drop to pick?

Page 12: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology : finding the release point

12

[h]

[s]

[sh]

<sil> similarity scores

Slope is a better predictor than absolute score value

The end point of a period with maximal slope the release point

Which fricative? [sh] score is more

consistent than other fricatives

Page 13: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Initial [t] in "doing" Initial [k] in “countries”

13

Methodology : finding the release point

[h]

[s]

[sh]

<sil>

[h]

[s]

[sh]

<sil>

Page 14: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology : finding the release point

14

Original algorithm Find the end point of a period of fastest increase in

<sh> score Find the end point of a period of fastest decrease in

<sil> score Return the middle point of the two end points as the

point of release If either or both end points cannot be found within the

duration of the stop, return NULL.

Page 15: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology : finding the release point

15

Select two speakers’ data to tune the model

Hand-tag the release point for all tokens in the test set. If the stop doesn’t appear to have a release point on the

spectrogram, mark it as a problematic case, and take the end point of the stop as the release point, for calculating error.

Speaker Age Gender Speaking rate # of tokens

# of test tokens

F07 Old Female Slow (4.022 syll/s)

231 231

M08 Young Male Fast (6.434 syll/sec

618 261

Page 16: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology : problematic cases16

no burst no closure weak and double release(??)

[sh]

<sil>

Page 17: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology : finding the release point

17

Calculate the difference between hand-tagged release point and the estimated one (i.e. error) for each case.

RMS (Root Mean Square) of error is used to measure the performance of the algorithm.

Page 18: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

F07 ( n=231 tokens) M08 (n=261 tokens)

Methodology : error analysis18

real release-estimate real release-estimate

Add 5ms to the estimation

RMS = 7.22ms

4.85ms

RMS = 13.11ms

14.ms

Page 19: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: tuning the algorithm19

1st Rejection Rule -- A target token will be rejected if the changes in scores

are not drastic enough.

E.g. Insignificant rise Reject!

[sh]

<sil>

Page 20: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: tuning the algorithm20

Applying 1st Rejection Rule Rejecting 4 cases inF07

RMS(+5ms) = 4.19ms Rejecting 28 cases in M08

covering most of the

problematic cases RMS(+5ms)=9.27ms

Error analysis in M08 after 1st rejection rule

RMS(+5ms) = 14ms

9.27ms

Page 21: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology : tuning the algorithm21

Still a problem… Multiple releases

Each might corresponds

to a rise/drop of the scores

Initial [k] in “cause” of M08

[sh]

<sil>

Page 22: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: tuning the algorithm22

2nd Rejection Rule -- A target token will be dropped If the points found in

<sh> and <sil> scores are too far apart. (>20ms) Partly solves the multiple release problem The ideal way would to identify all candidate release

points, and return the first one.

Page 23: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: tuning the algorithm23

Applying 2nd Rejection Rule Rejecting 3 cases inF07

RMS(+5ms) = 3.22ms Rejecting 20 cases in M08

Only 2 problematic cases remain RMS(+5ms) = 3.44ms

Error analysis in M08 after 2nd rejection rule

RMS(+5ms) = 9.26ms

3.44ms

Compare: Optimal error is 2.5ms given the 5ms step size…

Page 24: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: tuning the algorithm

# of cases

RMS RMS(+5ms)

Original 261 13.11 14After 1st rejection

233 9.27 9.26

After 2nd rejection

213 5.64 3.44

# of cases

RMS RMS(+5ms)

Original 231 7.22 4.85After 1st rejection

227 6.81 4.19

After 2nd rejection

224 6.02 3.22

24

F07 M08

Rejection rate: 3.03%

Rejection rate: 15.05%

Page 25: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: testing the algorithm25

Select a random sample of 50 tokens from all speakers Hand-tag the release point Use the current algorithm together with two rejection

rules to find the estimated release. Compare the hand-tagged point and the estimated one 4 rejected by the 1st rule (3 were legitimate) 3 rejected by the 2nd rule (2 were legitimate) 43 accepted cases. RMS(error) <5ms

Page 26: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Methodology: summary26

Calculate <silence> score and <sh> score

Calculate the slope in <silence> score and <sh> score

In a labeled voiceless stop span, (i)find the time point of largest positive slope in <sh> score, and store in p1; (ii)find the time point of smallest negative slope in <silence> score, and store in p2

return (p1+p2)/2+0.005

p1 = null or p2 = null

|p1–p2|>=0.02 s

slope (p1)<0.02 and

slope (p2)>0.04

reject the case

N

N

N

Y

Y

Y

Page 27: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Results: grand means27

Rejection rates (2 rules combined) Varies from 3. 03% to 30.5% (mean = 13.3%,sd=

8.6%) across speakers.

VOT and closure duration

[p] [t] [k]Closure (ms)

69.5 48.9 54.9

VOT (ms) 48 51.2 57.9

Page 28: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Results: VOT by speaker28

Page 29: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

General Discussion29

Echoing previous findings Byrd (1993): Closure duration and VOT in read speech

Shattuck-Hufnagel & Veilleux (2007): 13% of missing landmarks in spontaneous speech

[p] [t] [k]Closure (ms)

69 (69.5) 53 (48.9) 60 (54.9)

VOT (ms) 44 (48) 49 (51.2) 52 (57.9)

Page 30: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

General Discussion30

Future work Fine-tune the 2nd rejection rule Generalize the exemplar-based method for other

automatic phonetic processing problem?

Page 31: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

Acknowledgement31

Anonymous speakersBuckeye corpus developersProf. Keith JohnsonMembers of the phonology lab in UC

Berkeley

Thank you! Any comments are welcome.

Page 32: An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops

References32

Byrd, D. (1993) 54,000 American stops. UCLA Working Papers in Phonetics. No 83, pp: 97-116.

Johnson, K. (2006) Acoustic attribute scoring: A preliminary report. Liu, S. (1996) Landmark detection for distinctive feature-based speech

recognition. J. Acoust. Soc. Amer. Vol 100, pp 3417-3430. Niyogi, P., Ramesh, P. (1998) Incorporating voice onset time to improve

letter recognition accuracies. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '98. Vol 1, pp: 13-16.

Pitt, M. et al. (2005) The Buckeye Corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication. Vol 45, pp: 90-95

Shattuck-Hufnagel, S., Veilleux, N.M. (2007) Robustness of acoustic landmarks in spontaneously-spoken American English. Proceedings of International Congress of Phonetic Science 2007, Saarbrucken, August 2007.

Zue, V.W. (1976) Acoustic Characteristics of stop consonants: A controlled study. Sc. D. thesis. MIT, Cambridge, MA.