Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall...

Characterisation of individuals’ formant dynamics using

polynomial equations

Kirsty McDougallDepartment of LinguisticsUniversity of Cambridge

[email protected]

IAFPA 2006

Speaker characteristics and static features of speech

• Most previous research has focussed on static features- instantaneous, average

• Straightforward to measure

• Natural progression from other research areas – delineation of different languages and language varieties

• Reflect certain anatomical dimensions of a speaker, e.g. formant frequencies ~ length and configuration of VT

• Instantaneous and average measures - demonstrate speaker differences, but unable to distinguish all members of a population

look to dynamic (time-varying) features

Speaker characteristics and static features of speech

• More information than static

• Reflect movement of a person’s speech organs as well as dimensions- people move in individual ways for skilled motor activities - walking, running, … and speech

Dynamic features of speech

Dynamic features of speech

• can view speech as achievement of a series of linguistic ‘targets’

• speakers likely to exhibit similar properties at ‘targets’ (e.g. segment midpoints), but move between these in individual ways

examine formant frequency dynamics

Time (s)

/aɪ/ in ‘bike’ uttered by two male speakers of Australian English

Frequency (Hz)

Time (s)

Formant dynamics

Time (s) Time (s)


Frequency (Hz)

10% 10%

Formant dynamics

Time (s)


Frequency (Hz)

Time (s)

Formant dynamics

• How do speakers’ formant dynamics reflect individual differences in the production of the sequence //?

• How can this dynamic information be captured to characterise individual speakers?

Research Questions

bike

hike

like

mike

spike

/baIk/

/haIk/

/laIk/

/maIk/

/spaIk/

Target words:

/aIk/

e.g. I don’t want the scooter, I want the bike now. Later won’t do, I want the bike now.

5 repetitionsx 5 words (bike, hike, like, mike, spike)x 2 stress levels (nuclear, non-nuclear)x 2 speaking rates (normal, fast)= 100 tokens per subject

Data set

• 5 adult male native speakers of Australian English (A, B, C, D, E)

• aged 22-28

• Brisbane/Gold Coast, Queensland

Subjects

Speaker A “bike” (normal-nuclear)

1 2


1 2 10 20 30 40 50 60 70 80 90%


1 2 10 20 30 40 50 60 70 80 90%


F3 F2

F1

F3F2

F1

F1 normal-nuclear

1009080706050403020100

800

700

600

500

400

300

A

B

C

D

E

Fre

quen

cy (

Hz)

+10% step of /a/

F2 normal-nuclearF

requ

ency

(H

z)

+10% step of /a/ 1009080706050403020100

2000

1800

1600

1400

1200

1000

800

A

B

C

D

E

F3 normal-nuclearF

requ

ency

(H

z)

+10% step of /a/ 1009080706050403020100

2800

2700

2600

2500

2400

2300

2200

2100

2000

A

B

C

D

E

Discriminant Analysis

Multivariate technique used to determine whether a set of predictors (formant frequency measurements) can be combined to predict group (speaker) membership

(ref. Tabachnick and Fidell 1996)


fast-nuclear

Function 1

6420-2-4-6

Fu

nctio

n 2

6

4

2

0

-2

-4

A

B

C

D

E

Each datapoint represents 1 token

Each speaker’s tokens are represented with a different colour


fast-nuclear

Function 1

6420-2-4-6

Fu

nctio

n 2

6

4

2

0

-2

-4

A

B

C

D

E

Each datapoint represents 1 token

Each speaker’s tokens are represented with a different colour

e.g. Speaker E’s 25 tokens of /aɪk/


fast-nuclear

Function 1

6420-2-4-6

Fu

nctio

n 2

6

4

2

0

-2

-4

A

B

C

D

E

DA constructs discriminant functions which maximise differences between speakers

(each function is a linear combination of the formant frequency predictors)


fast-nuclear

Function 1

6420-2-4-6

Fu

nctio

n 2

6

4

2

0

-2

-4

A

B

C

D

E

Assess how well the predictors distinguish speakers by extent of clustering of tokens

+ classification percentage…


fast-nuclear

Function 1

6420-2-4-6

Fu

nctio

n 2

6

4

2

0

-2

-4

A

B

C

D

E

Assess how well the predictors distinguish speakers by extent of clustering of tokens

+ classification percentage…

95%

normal-nuclear

Function 1

6420-2-4-6-8

Function 2

4

2

0

-2

-4

-6

SpeakerG r o up Ce nt r o d s

5

4

3

2

1

fast-nuclear

Function 1

6420-2-4-6

Function 2

6

4

2

0

-2

-4


E

D

C

B

A

normal-non-nuclear

Function 1

6420-2-4-6

Function 2

4

3

2

1

0

-1

-2

-3

-4

SpeakerG r oup C en t r o ds

E

D

C

B

A

fast-non-nuclear

Function 1

6420-2-4

Function 2

6

4

2

0

-2

-4


E

D

C

B

A

normal-nuclear

Function 1

6420-2-4-6-8

Function 2

4

2

0

-2

-4

-6

Speaker

Group Centroids

A

B

C

D

E

A

B

C

D

E


95%

88%

95%

89%

Discussion

• DA scatterplots and classification rates promising

• However, not very efficient – method essentially based on a series of instantaneous measurements, probably containing dependent information

• Recall: individuals’ F1 contours of /aɪk/…

F1 normal-nuclear

1009080706050403020100

800

700

600

500

400

300

A

B

C

D

E

Fre

quen

cy (

Hz)

+10% step of /a/

A new approach…

• Differences in location in frequency range

• Differences in curvature – location of turning points, convex/concave, steep/shallow

• Need to capture most defining aspects of the contours efficiently

linear regression to parameterise curves with polynomial equations

Linear regression

• Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points

0

5

10

15

20

0 5 10 15 20

y

x

Linear regression


0

5

10

15

20

0 5 10 15 20

y

x

y = a0 + a1x

Linear regression


0

5

10

15

20

0 5 10 15 20

y

x

y = a0 + a1x

y-intercept

Linear regression


0

5

10

15

20

0 5 10 15 20

y

x

y = a0 + a1x

y-interceptgradient

Linear regression

• Can also be used for curvilinear relationships

0

5

10

15

20

0 5 10 15 20

0

5

10

15

20

0 5 10 15 20

y

x

Linear regression


quadratic:y = a0 + a1x + a2x2

0

5

10

15

20

0 5 10 15 20

y

x

Linear regression



y-intercept

0

5

10

15

20

0 5 10 15 20

y

x

Linear regression



y-interceptdetermine shape and direction of curve

0

5

10

15

20

0 5 10 15 20

y

x

Polynomial Equations

x

x

x

y

y

y

Cubic

y = a0 + a1x + a2x2 + a3x3

Quartic

y = a0 + a1x + a2x2 + a3x3 + a4x4

Quintic

y = a0 + a1x + a2x2 + a3x3+ a4x4 + a5x5

/ak/ data

• fit F1, F2, F3 contours with polynomial equations

• test the reliability of the polynomial coefficients in distinguishing speakers

Quadratic: y = a0 + a1t + a2t2

Cubic: y = a0 + a1t + a2t2 + a3t3

0

100

200

300

400

500

600

700

800

0 1 2 3 4 5 6 7 8 9

Normalised time

actual

quadratic fit

cubic fit

actual data points

Quadratic fit: y = 420.68 + 79.26t - 5.92t2

Cubic fit:y = 478.85 - 46.07t + 35.62t2

- 3.46t3

“bike”, Speaker A (normal-nuclear token 1)

0

100

200

300

400

500

600

700

800

0 1 2 3 4 5 6 7 8 9

Normalised time

Frequency (Hz)

Normalised time

F1 contoury

t

0

100

200

300

400

500

600

700

800

0 1 2 3 4 5 6 7 8 9

Normalised time

actual

quadratic fit

cubic fit

actual data points

Quadratic fit: y = 420.68 + 79.26t - 5.92t2

R = 0.879

Cubic fit:y = 478.85 - 46.07t + 35.62t2

- 3.46t3

R = 0.978


0

100

200

300

400

500

600

700

800

0 1 2 3 4 5 6 7 8 9

Normalised time

Frequency (Hz)

Normalised time

F1 contoury

t

600

800

1000

1200

1400

1600

1800

2000

0 1 2 3 4 5 6 7 8 9

Normalised time


0

100

200

300

400

500

600

700

800

0 1 2 3 4 5 6 7 8 9

Normalised time

actual

quadratic fit

cubic fit

actual data points

Quadratic fit: y = 876.01 - 53.24t + 22.46t2

R = 0.985

Cubic fit:y = 825.49 + 55.64t - 13.63t2

+ 3.01t3

R = 0.991

Frequency (Hz)

Normalised time

F2 contoury

t

DA on polynomial coefficents

• Quadratic 3 formants x 3 coefficients = 9 predictors

• Cubic3 formants x 4 coefficients = 12 predictors

• Cubic + duration of /a/ 12 + 1 = 13 predictors

0

20

40

60

80

100

normal-nuclear

fast-nuclear normal-non-nuclear

fast-non-nuclear

quadratic

cubic

cubic + dur

direct meas'ts

Comparison of Classification Rates

% Correct Classification

0

20

40

60

80

100

normal-nuclear


fast-non-nuclear

quadratic

cubic

cubic + dur

direct meas'ts


No. of predictors:

(9)

(12)

(13)

(20)


0

20

40

60

80

100

normal-nuclear


fast-non-nuclear

quadratic

cubic

cubic + dur

direct meas'ts


96% 92% 89% 90%

No. of predictors:

(9)

(12)

(13)

(20)


0

20

40

60

80

100

normal-nuclear


fast-non-nuclear

quadratic

cubic

cubic + dur

direct meas'ts


No. of predictors:

(9)

(12)

(13)

(20)


Summary of findings

• Comparing polynomial-based tests & direct measurement-based tests: reduction in classification accuracy small in return for much smaller no. of predictors required

• Future: aim to develop this approach to enable inclusion of additional information parametrise other dynamic aspects of speech to capture a dense amount of speaker-specific info with a small no. of predictors

Conclusion

• Differences in formant dynamics reflect differences in articulatory strategies (& VT dimensions) among speakers

e.g. speaker-specificity of /ak/ formant dynamics

- differences in shape and frequency for F1, F2 and F3- preserved across changes in speaking rate and stress

Conclusion

• Trialled new technique for characterising individuals’ formant contours using polynomial equations on /ak/ data

• Able to capture almost same amount of speaker-specific information with far fewer predictors

Polynomial approach using formant dynamics should make an important contribution to speaker characterisation techniques in future

Characterisation of individuals’ formant dynamics using

polynomial equations

Kirsty McDougallDepartment of LinguisticsUniversity of Cambridge

[email protected]

IAFPA 2006

Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall...

Documents

Transcript of Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall...