Inducing and using phonetic similarity - martijnwieling.nl fileInducing and using phonetic...
Transcript of Inducing and using phonetic similarity - martijnwieling.nl fileInducing and using phonetic...
Inducing and using phonetic similarity
Martijn Wieling and John Nerbonne
University of Tübingen / University of Groningen
Language Diversity Conference, July 20, 2013
1 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
OverviewI Inducing phonetic similarity
I The need for phonetic similarity (i.e. sensitive sound segment distances)I Obtaining sensitive sound segment distancesI Evaluating the quality of sensitive sound segment distances
I Using phonetic similarity: English accentsI The Speech Accent ArchiveI Linking computational and perceptual pronunciation distancesI A visualization of English accents
I Time permitting: Pronunciation similarity via articulography
2 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Collaborators
3 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
The need for sensitive segment distancesI In our research on language variation, we employ pronunciation
distances (on the basis of alignments)I We would like to improve alignment quality and the distances
I There is no widely accepted procedure to determine phonetic similarity(Laver, 1994)
I Here we use the distribution of pronunciation variation to determinesimilarity
I In line with language as “un systême oû tout se tient” (focus on relationsbetween items, not items themselves; Meillet, 1903)
4 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Our starting point: the Levenshtein distanceRestriction: vowels are not aligned with consonants
I The Levenshtein distance measures the minimum nr. of insertions,deletions and substitutions to transform one string into another
mO@lk@ delete O 1m@lk@ subst. @/E 1mElk@ delete @ 1mElk insert @ 1mEl@k
4
m O @ l k @m E l @ k
1 1 1 1
I Note the implicit identification of sound correspondences
5 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Our starting point: the Levenshtein distanceRestriction: vowels are not aligned with consonants
I The Levenshtein distance measures the minimum nr. of insertions,deletions and substitutions to transform one string into another
mO@lk@ delete O 1m@lk@ subst. @/E 1mElk@ delete @ 1mElk insert @ 1mEl@k
4
m O @ l k @m E l @ k
1 1 1 1
I Note the implicit identification of sound correspondences
5 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Counting sound segment correspondencesI Counting the frequency of sound segments (in the Levenshtein alignments)
p b ... U u Total5 × 105 2 × 105 ... 90,000 9 × 105 108
I Counting the frequency of the aligned sound segments (in the Levenshtein alignments)
p b ... U up 2 × 105 10,650 ... 0 0b 88,000 ... 0 0...
......
...U 65,400 5,500u 4 × 105
Total: 5 × 107
I Probability of observing [p]: 5 × 105 / 108 = 0.005 (0.5%)I Probability of observing [b]: 2 × 105 / 108 = 0.002 (0.2%)I Probability of observing [p]:[b]: 10,650 / 5 × 107 = 0.0002 (0.02%)
6 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Association strength between segment pairsI Pointwise Mutual Information (PMI): assesses degree of statistical
dependence between aligned segments (x and y )
PMI(x , y) = log2
(p(x , y)
p(x)p(y)
)I p(x , y): relative occurrence of the aligned segments x and y in the whole
datasetI p(x) and p(y): relative occurrence of x and y in the whole dataset
I The greater the PMI value, the more segments tend to cooccur incorrespondences
7 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Association strength between segment pairsI Probability of observing [p]:[b]: 10,650 / 5 × 107 = 0.0002I Probability of observing [p]: 5 × 105 / 108 = 0.005I Probability of observing [b]: 2 × 105 / 108 = 0.002
PMI(x , y) = log2
(p(x , y)
p(x)p(y)
)⇒
PMI([p], [b]) = log2
(0.0002
0.005× 0.002
)
PMI([p], [b]) ≈ 4.3
8 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Using PMI values with the Levenshtein algorithmI Idea: use association strength to weight edit operationsI PMI is large for strong associations, so invert it (0 - PMI)
I Strongly associated segments will have a low distanceI PMI range varies, so normalize it between 0 and 1I Use PMI-induced weights as costs in Levenshtein algorithm
I Cost of substituting identical sound segments is always set to 0
9 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
The PMI-based Levenshtein algorithmI We use the standard Levenshtein algorithm to calculate the initial PMI
weights and convert these to costs (i.e. sound distances)
I These sensitive sound distances are then used as edit operation costs inthe Levenshtein algorithm to obtain new alignments, new counts, andnew PMI sound distances
I This process is repeated until alignments and PMI sound distancesstabilize
I Besides improved alignments (Wieling et al., 2009), this procedureautomatically yields sensitive sound segment distances
m O @ l k @m E l @ k
0.20 0.15 0.12 0.12
10 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
The PMI-based Levenshtein algorithmI We use the standard Levenshtein algorithm to calculate the initial PMI
weights and convert these to costs (i.e. sound distances)
I These sensitive sound distances are then used as edit operation costs inthe Levenshtein algorithm to obtain new alignments, new counts, andnew PMI sound distances
I This process is repeated until alignments and PMI sound distancesstabilize
I Besides improved alignments (Wieling et al., 2009), this procedureautomatically yields sensitive sound segment distances
m O @ l k @m E l @ k
0.20 0.15 0.12 0.12
10 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Pronunciation dataI Six independent dialect data sets (IPA pronunciations)
I Dutch: 562 words in 613 locations (Wieling et al., 2007)I German: 201 words in 186 locations (Nerbonne and Siedle, 2005)I U.S. English: 153 words in 483 locations (Kretzschmar, 1994)I Bantu (Gabon): 160 words in 53 locations (Alewijnse et al., 2007)I Bulgarian: 152 words in 197 locations (Prokic et al., 2009)I Tuscan: 444 words in 213 locations (Montemagni et al., 2012)
I For all datasets sound segment distances are obtained using thePMI-based Levenshtein algorithm
I We use a slightly adapted version: ignoring identical sound segmentsubstitutions in the counts
11 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Acoustic dataI For the evaluation, we obtained acoustic vowel measurements (F1 and
F2) reported in the scientific literatureI Pols et al. (1973; NL), van Nierop et al. (1973; NL), Sendlmeier and
Seebode (2006; GER), Hillenbrand et al. (1995; US), Nurse and Phillipson(2003, p. 22; BAN), Lehiste and Popov (1970; BUL), Calamai (2003; TUS)
I To determine acoustic vowel distance, we calculate the Euclideandistance of the formant frequencies
I Our perception of frequency is non-linear and calculating the Euclideandistance on the basis of Hertz values would not give enough weight to thefirst formant
I We therefore first scale the Hertz frequencies to Bark
12 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Comparison procedure between acoustic and PMIdistances
I We assess the relation between the generated and acoustic distancesusing the Pearson correlation
I We visualize the relative position of the sound segments by applyingmultidimensional scaling (MDS) to the distance matrices
I Missing distances are not allowed in the (classical) MDS procedure, so insome cases not all sound segments are visualized
13 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Acoustic vs. PMI vowel distances
Pearson’s r Explained variance (r2)Dutch 0.672 45.2%
Dutch w/o Frisian 0.686 47.1%German 0.633 40.1%
German w/o @ 0.785 61.6%US English 0.608 37.0%
Bantu 0.642 41.2%Bulgarian 0.677 45.8%
Tuscan 0.758 57.5%
14 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of Dutch vowelsPMI visualization captures 76% of the variation
(a) IPA
u
o
ɑ
a
ɛ
eɪ
i y
ʏøɔ
(b) Acoustics
ɒ
e
ɛ
ə
a
ɪ
ɑ
ɔ
i
o
æ
ʌ
u
œʊɵ
y
ø
(c) PMI distances
15 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of German vowelsPMI visualization captures 70% of the variation
(a) IPA
a
e
ɛ
ɪ
ioʊ
u
ʏy
œ
ø
ə ɔ
(b) Acoustics
ɔə
o
ɒɑ
ɤ
u
ɐ
ʊ
ʌ
a
æ
ɪ
ɛe
Ͽ
i
ʏɯy
(c) PMI distances
16 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of U.S. English vowelsPMI visualization captures 65% of the variation
(a) IPA
iɪ
e
ɛæ
ɑ
oʊu
ʌ ɔ
(b) Acoustics
əɪ
ɛ
u ʊ
ɑ
ɒ
ɜɔ
ʌ
ɐ
o
ɞ
(c) PMI distances
17 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of Bantu vowelsPMI visualization captures 90% of the variation
(a) IPA
i
e
əɛ
a
ɔ
ou
(b) Acoustics
e
i
ɛ ɔ
u
a
o
ə
(c) PMI distances
18 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of Bulgarian vowelsPMI visualization captures 86% of the variation
(a) IPA
i
e ə
a
o
u
(b) Acoustics
ɑ
ei
ə
ɛ
ɤ
o uʊ
a
(c) PMI distances
19 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of Tuscan vowelsPMI visualization captures 97% of the variation
(a) IPA
a
ɛ
ei
o
u
ɔ
(b) Acoustics
a
i
o
e
u
(c) PMI distances
20 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
What about consonants?I Induced distances correlate strongly with acoustic vowel distances
I Causation is probably the reverse: acoustics explains distributionsSweeney’s insight: “I gotta use words when I talk to you...”
I But for other segments (consonants) acoustic/phonetic distances are notwell accepted, and this procedure provides a measure of distance
21 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of Dutch consonantsPMI visualization captures 50% of the variation
22 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of Dutch consonantsPlace (3 groups) dominates over manner (2 groups) and voicing (no groups)
23 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
RecapI The PMI approach yields sensible sound distances in a flexible,
data-driven manner
I For more details and references, see: Martijn Wieling, Eliza Margarethaand John Nerbonne (2012). Inducing a measure of phonetic similarityfrom pronunciation variation. Journal of Phonetics, 40(2), 307-314
I In the following, this method is used to obtain pronunciation distances onthe basis of English accented speech
24 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
The Speech Accent ArchiveAvailable online at http://accent.gmu.edu
25 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Audio example
Listen to an example
26 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Validating the PMI-based Levenshtein pronunciationdistances
I There is only a single study investigating the relation betweenLevenshtein distances and perceptual distances
I Gooskens and Heeringa (LVC, 2004)I Focusing on Norwegian dialectsI The reported correlation strength was r ≈ 0.7
I We conducted a new study based on the Speech Accent Archive,investigating the relation between perceptual and Levenshteinpronunciation distances
27 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Outline of the perception experimentI We provided six sets of 50 audio samples (286 distinct speakers; 272
non-native) and asked native American English speakers to rate theirnative-likeness on a scale from 1 (very foreign sounding) to 7 (nativeAmerican English)
I More than 1100 (!) people filled in the questionnaire
28 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Results of the perception experiment(Wieling, Bloem, Mignella, Timmermeister and Nerbonne, in prep.)
I We calculated the average native-likeness score for every speaker
I We used the PMI-based Levenshtein algorithm to obtain the automaticpronunciation distances for each of these speakers and the average U.S.speaker (based on 115 samples of native U.S. English speakers born inthe U.S.)
I The distance between two speakers is calculated by averaging the wordpronunciation distances between the two speakers
I The correlation between the two measures was r ≈ 0.8
I We may conclude that the Levenshtein distance is a valid measure todetermine perceptually sound pronunciation distances
29 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Visualizing English pronunciationsI We used 989 phonetically transcribed samples from the SAAI We grouped the transcriptions (i.e. speakers) per countryI For non-English speaking countries, we excluded speakers who moved to
an English-speaking country before age 13I We only included countries with at least 5 speakers
I Pronunciation distances between countries were calculated using thePMI-based Levenshtein algorithm and are visualized using MDS
30 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
MDS visualization of accent distances88% of the variation in the original pronunciation distances is captured
31 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
ConclusionsI The PMI approach allows us to induce sensible sound distances
I The approach is readily applicable to any (dialect) dataset with similarpronunciations
I The PMI-based Levenshtein algorithm yields pronunciation distanceswhich correspond well with perceptual pronunciation distances
I Next step: determining pronunciation distances on the basis of themovements of the articulators during speech
I Work in progress...I If interested, ask about it!
32 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Thank you for your attention!
33 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Pronunciation similarity via articulography: intro
34 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Connected to the articulography device
35 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Pilot experiment
“The men hit the squirrel that walked on this verybroad street, but they just failed to kill him.”
American speaker German speaker
36 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Tongue movement
37 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Tongue tip movement
palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate
low
high
front backPosition front−back
Hei
ght
Start pronunciation ....... End pronunciation
Position tongue tip in mouth
American speaker: "that"
palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate
low
high
front backPosition front−back
Hei
ght
Start pronunciation ....... End pronunciation
Position tongue tip in mouth
German speaker: "that"
Listen to the pronunciations
38 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Statistical analysis
low
high
Startpronun.
Endpronun.
Time
Hei
ght
Speaker
American
German
Height tongue tip: "that"
fron
tba
ck
Startpronun.
Endpronun.
Time
Pos
ition
fron
t−ba
ck
Speaker
American
German
Position tongue tip front−back: "that"
39 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Tongue tip movement
palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate
low
high
front backPosition front−back
Hei
ght
Start pronunciation ....... End pronunciation
Position tongue tip in mouth
American speaker: "squirrel"
palatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalatepalate
low
high
front backPosition front−back
Hei
ght
Start pronunciation ....... End pronunciation
Position tongue tip in mouth
German speaker: "squirrel"
Listen to the pronunciations
40 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Statistical analysis
low
high
Startpronun.
Endpronun.
Time
Hei
ght
Speaker
American
German
Height tongue tip: "squirrel"
fron
tba
ck
Startpronun.
Endpronun.
Time
Pos
ition
fron
t−ba
ck
Speaker
American
German
Position tongue tip front−back: "squirrel"
41 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Pilot conclusionsI Pronunciation and tongue movement may reveal different patternsI Generalized additive modeling allows us to extract the general movement
patterns while taking into account the individual variationI It also allows the incorporation of additional informative variables (e.g., the
age at which a non-native speaker learned English)
42 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen
Thank you for your attention!
43 | Martijn Wieling and John Nerbonne Inducing and using phonetic similarity University of Tübingen