Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall...
-
date post
20-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall...
Characterisation of individuals’ formant dynamics using
polynomial equations
Kirsty McDougallDepartment of LinguisticsUniversity of Cambridge
IAFPA 2006
Speaker characteristics and static features of speech
• Most previous research has focussed on static features- instantaneous, average
• Straightforward to measure
• Natural progression from other research areas – delineation of different languages and language varieties
• Reflect certain anatomical dimensions of a speaker, e.g. formant frequencies ~ length and configuration of VT
• Instantaneous and average measures - demonstrate speaker differences, but unable to distinguish all members of a population
look to dynamic (time-varying) features
Speaker characteristics and static features of speech
• More information than static
• Reflect movement of a person’s speech organs as well as dimensions- people move in individual ways for skilled motor activities - walking, running, … and speech
Dynamic features of speech
Dynamic features of speech
• can view speech as achievement of a series of linguistic ‘targets’
• speakers likely to exhibit similar properties at ‘targets’ (e.g. segment midpoints), but move between these in individual ways
examine formant frequency dynamics
Time (s)
/aɪ/ in ‘bike’ uttered by two male speakers of Australian English
Frequency (Hz)
Time (s)
Formant dynamics
Time (s) Time (s)
/aɪ/ in ‘bike’ uttered by two male speakers of Australian English
Frequency (Hz)
10% 10%
Formant dynamics
Time (s)
/aɪ/ in ‘bike’ uttered by two male speakers of Australian English
Frequency (Hz)
Time (s)
Formant dynamics
• How do speakers’ formant dynamics reflect individual differences in the production of the sequence //?
• How can this dynamic information be captured to characterise individual speakers?
Research Questions
bike
hike
like
mike
spike
/baIk/
/haIk/
/laIk/
/maIk/
/spaIk/
Target words:
/aIk/
e.g. I don’t want the scooter, I want the bike now. Later won’t do, I want the bike now.
5 repetitionsx 5 words (bike, hike, like, mike, spike)x 2 stress levels (nuclear, non-nuclear)x 2 speaking rates (normal, fast)= 100 tokens per subject
Data set
• 5 adult male native speakers of Australian English (A, B, C, D, E)
• aged 22-28
• Brisbane/Gold Coast, Queensland
Subjects
Speaker A “bike” (normal-nuclear)
1 2
Speaker A “bike” (normal-nuclear)
1 2 10 20 30 40 50 60 70 80 90%
Speaker A “bike” (normal-nuclear)
1 2 10 20 30 40 50 60 70 80 90%
Speaker A “bike” (normal-nuclear)
F3 F2
F1
F3F2
F1
F1 normal-nuclear
1009080706050403020100
800
700
600
500
400
300
A
B
C
D
E
Fre
quen
cy (
Hz)
+10% step of /a/
F2 normal-nuclearF
requ
ency
(H
z)
+10% step of /a/ 1009080706050403020100
2000
1800
1600
1400
1200
1000
800
A
B
C
D
E
F3 normal-nuclearF
requ
ency
(H
z)
+10% step of /a/ 1009080706050403020100
2800
2700
2600
2500
2400
2300
2200
2100
2000
A
B
C
D
E
Discriminant Analysis
Multivariate technique used to determine whether a set of predictors (formant frequency measurements) can be combined to predict group (speaker) membership
(ref. Tabachnick and Fidell 1996)
Discriminant Analysis
fast-nuclear
Function 1
6420-2-4-6
Fu
nctio
n 2
6
4
2
0
-2
-4
A
B
C
D
E
Each datapoint represents 1 token
Each speaker’s tokens are represented with a different colour
Discriminant Analysis
fast-nuclear
Function 1
6420-2-4-6
Fu
nctio
n 2
6
4
2
0
-2
-4
A
B
C
D
E
Each datapoint represents 1 token
Each speaker’s tokens are represented with a different colour
e.g. Speaker E’s 25 tokens of /aɪk/
Discriminant Analysis
fast-nuclear
Function 1
6420-2-4-6
Fu
nctio
n 2
6
4
2
0
-2
-4
A
B
C
D
E
DA constructs discriminant functions which maximise differences between speakers
(each function is a linear combination of the formant frequency predictors)
Discriminant Analysis
fast-nuclear
Function 1
6420-2-4-6
Fu
nctio
n 2
6
4
2
0
-2
-4
A
B
C
D
E
Assess how well the predictors distinguish speakers by extent of clustering of tokens
+ classification percentage…
Discriminant Analysis
fast-nuclear
Function 1
6420-2-4-6
Fu
nctio
n 2
6
4
2
0
-2
-4
A
B
C
D
E
Assess how well the predictors distinguish speakers by extent of clustering of tokens
+ classification percentage…
95%
normal-nuclear
Function 1
6420-2-4-6-8
Function 2
4
2
0
-2
-4
-6
SpeakerG r o up Ce nt r o d s
5
4
3
2
1
fast-nuclear
Function 1
6420-2-4-6
Function 2
6
4
2
0
-2
-4
SpeakerG r o up Ce nt r o d s
E
D
C
B
A
normal-non-nuclear
Function 1
6420-2-4-6
Function 2
4
3
2
1
0
-1
-2
-3
-4
SpeakerG r oup C en t r o ds
E
D
C
B
A
fast-non-nuclear
Function 1
6420-2-4
Function 2
6
4
2
0
-2
-4
SpeakerG r o up Ce nt r o d s
E
D
C
B
A
normal-nuclear
Function 1
6420-2-4-6-8
Function 2
4
2
0
-2
-4
-6
Speaker
Group Centroids
A
B
C
D
E
A
B
C
D
E
Discriminant Analysis
95%
88%
95%
89%
Discussion
• DA scatterplots and classification rates promising
• However, not very efficient – method essentially based on a series of instantaneous measurements, probably containing dependent information
• Recall: individuals’ F1 contours of /aɪk/…
F1 normal-nuclear
1009080706050403020100
800
700
600
500
400
300
A
B
C
D
E
Fre
quen
cy (
Hz)
+10% step of /a/
A new approach…
• Differences in location in frequency range
• Differences in curvature – location of turning points, convex/concave, steep/shallow
• Need to capture most defining aspects of the contours efficiently
linear regression to parameterise curves with polynomial equations
Linear regression
• Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points
0
5
10
15
20
0 5 10 15 20
y
x
Linear regression
• Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points
0
5
10
15
20
0 5 10 15 20
y
x
Linear regression
• Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points
0
5
10
15
20
0 5 10 15 20
y
x
Linear regression
• Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points
0
5
10
15
20
0 5 10 15 20
y
x
y = a0 + a1x
Linear regression
• Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points
0
5
10
15
20
0 5 10 15 20
y
x
y = a0 + a1x
y-intercept
Linear regression
• Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points
0
5
10
15
20
0 5 10 15 20
y
x
y = a0 + a1x
y-interceptgradient
Linear regression
• Can also be used for curvilinear relationships
0
5
10
15
20
0 5 10 15 20
0
5
10
15
20
0 5 10 15 20
y
x
Linear regression
• Can also be used for curvilinear relationships
quadratic:y = a0 + a1x + a2x2
0
5
10
15
20
0 5 10 15 20
y
x
Linear regression
• Can also be used for curvilinear relationships
quadratic:y = a0 + a1x + a2x2
y-intercept
0
5
10
15
20
0 5 10 15 20
y
x
Linear regression
• Can also be used for curvilinear relationships
quadratic:y = a0 + a1x + a2x2
y-interceptdetermine shape and direction of curve
0
5
10
15
20
0 5 10 15 20
y
x
Polynomial Equations
x
x
x
y
y
y
Cubic
y = a0 + a1x + a2x2 + a3x3
Quartic
y = a0 + a1x + a2x2 + a3x3 + a4x4
Quintic
y = a0 + a1x + a2x2 + a3x3+ a4x4 + a5x5
Polynomial Equations
x
x
x
y
y
y
Cubic
y = a0 + a1x + a2x2 + a3x3
Quartic
y = a0 + a1x + a2x2 + a3x3 + a4x4
Quintic
y = a0 + a1x + a2x2 + a3x3+ a4x4 + a5x5
/ak/ data
• fit F1, F2, F3 contours with polynomial equations
• test the reliability of the polynomial coefficients in distinguishing speakers
Quadratic: y = a0 + a1t + a2t2
Cubic: y = a0 + a1t + a2t2 + a3t3
0
100
200
300
400
500
600
700
800
0 1 2 3 4 5 6 7 8 9
Normalised time
actual
quadratic fit
cubic fit
actual data points
Quadratic fit: y = 420.68 + 79.26t - 5.92t2
Cubic fit:y = 478.85 - 46.07t + 35.62t2
- 3.46t3
“bike”, Speaker A (normal-nuclear token 1)
0
100
200
300
400
500
600
700
800
0 1 2 3 4 5 6 7 8 9
Normalised time
Frequency (Hz)
Normalised time
F1 contoury
t
0
100
200
300
400
500
600
700
800
0 1 2 3 4 5 6 7 8 9
Normalised time
actual
quadratic fit
cubic fit
actual data points
Quadratic fit: y = 420.68 + 79.26t - 5.92t2
R = 0.879
Cubic fit:y = 478.85 - 46.07t + 35.62t2
- 3.46t3
R = 0.978
“bike”, Speaker A (normal-nuclear token 1)
0
100
200
300
400
500
600
700
800
0 1 2 3 4 5 6 7 8 9
Normalised time
Frequency (Hz)
Normalised time
F1 contoury
t
600
800
1000
1200
1400
1600
1800
2000
0 1 2 3 4 5 6 7 8 9
Normalised time
“bike”, Speaker A (normal-nuclear token 1)
0
100
200
300
400
500
600
700
800
0 1 2 3 4 5 6 7 8 9
Normalised time
actual
quadratic fit
cubic fit
actual data points
Quadratic fit: y = 876.01 - 53.24t + 22.46t2
R = 0.985
Cubic fit:y = 825.49 + 55.64t - 13.63t2
+ 3.01t3
R = 0.991
Frequency (Hz)
Normalised time
F2 contoury
t
DA on polynomial coefficents
• Quadratic 3 formants x 3 coefficients = 9 predictors
• Cubic3 formants x 4 coefficients = 12 predictors
• Cubic + duration of /a/ 12 + 1 = 13 predictors
0
20
40
60
80
100
normal-nuclear
fast-nuclear normal-non-nuclear
fast-non-nuclear
quadratic
cubic
cubic + dur
direct meas'ts
Comparison of Classification Rates
% Correct Classification
0
20
40
60
80
100
normal-nuclear
fast-nuclear normal-non-nuclear
fast-non-nuclear
quadratic
cubic
cubic + dur
direct meas'ts
% Correct Classification
No. of predictors:
(9)
(12)
(13)
(20)
Comparison of Classification Rates
0
20
40
60
80
100
normal-nuclear
fast-nuclear normal-non-nuclear
fast-non-nuclear
quadratic
cubic
cubic + dur
direct meas'ts
% Correct Classification
No. of predictors:
(9)
(12)
(13)
(20)
Comparison of Classification Rates
0
20
40
60
80
100
normal-nuclear
fast-nuclear normal-non-nuclear
fast-non-nuclear
quadratic
cubic
cubic + dur
direct meas'ts
% Correct Classification
No. of predictors:
(9)
(12)
(13)
(20)
Comparison of Classification Rates
0
20
40
60
80
100
normal-nuclear
fast-nuclear normal-non-nuclear
fast-non-nuclear
quadratic
cubic
cubic + dur
direct meas'ts
% Correct Classification
96% 92% 89% 90%
No. of predictors:
(9)
(12)
(13)
(20)
Comparison of Classification Rates
0
20
40
60
80
100
normal-nuclear
fast-nuclear normal-non-nuclear
fast-non-nuclear
quadratic
cubic
cubic + dur
direct meas'ts
% Correct Classification
No. of predictors:
(9)
(12)
(13)
(20)
Comparison of Classification Rates
0
20
40
60
80
100
normal-nuclear
fast-nuclear normal-non-nuclear
fast-non-nuclear
quadratic
cubic
cubic + dur
direct meas'ts
% Correct Classification
No. of predictors:
(9)
(12)
(13)
(20)
Comparison of Classification Rates
Summary of findings
• Comparing polynomial-based tests & direct measurement-based tests: reduction in classification accuracy small in return for much smaller no. of predictors required
• Future: aim to develop this approach to enable inclusion of additional information parametrise other dynamic aspects of speech to capture a dense amount of speaker-specific info with a small no. of predictors
Conclusion
• Differences in formant dynamics reflect differences in articulatory strategies (& VT dimensions) among speakers
e.g. speaker-specificity of /ak/ formant dynamics
- differences in shape and frequency for F1, F2 and F3- preserved across changes in speaking rate and stress
Conclusion
• Trialled new technique for characterising individuals’ formant contours using polynomial equations on /ak/ data
• Able to capture almost same amount of speaker-specific information with far fewer predictors
Polynomial approach using formant dynamics should make an important contribution to speaker characterisation techniques in future
Characterisation of individuals’ formant dynamics using
polynomial equations
Kirsty McDougallDepartment of LinguisticsUniversity of Cambridge
IAFPA 2006