Language modeling for speaker recognition Dan Gillick January 20, 2004.
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Language modeling for speaker recognition Dan Gillick January 20, 2004.
![Page 1: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/1.jpg)
Language modeling for speaker recognition
Dan Gillick
January 20, 2004
![Page 2: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/2.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (2)
Outline
• Author identification
• Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition)
• My next project
![Page 3: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/3.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (3)
Author ID (undergrad. thesis)
Problem: – train models for each of k authors– given some test text written by 1 of those
authors, identify the correct author
Variations:– different kinds of models– different size test samples– different k
![Page 4: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/4.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (4)
Character n-gram models
What?– 27 tokens: a-z, <space>– some text generated from such a trigram model:
“you orthad gool of anythilly
uncand or prafecaustiont and to hing that put ably”
![Page 5: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/5.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (5)
Character n-gram models
Why?– very simple– data sparseness less troublesome than with
word n-grams– supposed to be state-of-the-art or at least close
to it (Khmelev, D, Tweedie, F.J. “Using Markov Chains for the Identification of Writers”: Literary and Linguistic Computing, 16(4): 299-307. 2001.)
![Page 6: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/6.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (6)
Character n-grams: Setup
• task: pick correct author from 10 possible authors
• training data: 3 novels for each author
• test data: text from a held-out novel
• jack-knifing: 4 novels for each of 20 authors
![Page 7: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/7.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (7)
Character n-grams: Results• task: picking 1 author from 10 possible authors
• training data size: 3 novels
![Page 8: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/8.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (8)
Character n-gram models
Why does it work?– captures some word choice information– picks up word endings (–ing, -tion, -ly, etc.)– not hurt much by data sparseness issues
![Page 9: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/9.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (9)
Key-list models
Incentive:– ought to be able to beat character n-grams– develop a new modeling method more focused
on that which differentiates between authors (characters and words are both useful for topic recognition, but that doesn’t mean they are best for author recognition)
![Page 10: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/10.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (10)
Key-list models
Idea:– convert the text stream into a stream of only
authorship-relevant symbols (I called these lists of symbols key-lists)
– each symbol is a regular expression to allow for broad definitions (/*tion/ captures any nounification)
– text not accounted for by the key-list is represented by <short>, <med>, or <long> markers
– build n-gram models from these new streams
![Page 11: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/11.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (11)
Key-list models
sample trigram: <comma> <short> <period>
Regular Expression Description
(\w)(,)(\s) comma
(\w)(\.)(\s) period
(\b)(of|for|to|around|after| … )(\b) common prepositions
(\b)(was|were \w*ed(\b) passive voice
(\b)(is|was|will|are|were|am)(\b) is conjugations
(\b)(\w*ing)(\b) ends in –ing
(\b)(\w*ly)(\b) adverb
(\b)(and|but|or|not|if|then|else)(\b) logical
(\b)(as)(\b) as
(\b)(would|should|could)(\b) modal verbs
Sample key-list:
![Page 12: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/12.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (12)
Key-list models: Results• task: picking 1 author from 10 possible authors
• training data size: 3 novels
![Page 13: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/13.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (13)
Key-list models: Results
Some other interesting results:– key-lists with just punctuation (as well as
<short>, <med>, <long>) performed almost as well as the best key-lists
– all key-lists were outperformed by the best n-letter model when test data size < 10,000 chars. but all key-list models eventually surpassed the n-letter models
![Page 14: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/14.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (14)
Key-list models
Things I didn’t do:– vary amount of training data– spend a long time trying different key-lists– combine key-list results with each other or with
the character results– a lot of other stuff
The thesis is available on the web: http://www.dgillick.com/resource/thesis.pdf
![Page 15: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/15.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (15)
Outline
• Author identification
• Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition)
• My next project
![Page 16: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/16.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (16)
G. Doddington’s LM strategy
• create LMs with a limited vocabulary of the most commonly occurring 2000 bigrams
• to smooth out zeroes, boost each bigram prob. by 0.001
• score by calculating:
logprob(test|target) – logprob(test|bkg)
• logprobs are joint probabilitieslogprob(AB) = logprob(A) + logprob(B|A)
![Page 17: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/17.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (17)
G. Doddington’s LM: Setup
Switchboard 1 data:– collected in early ’90s from all over the US– 2,400 (~5 min.) conversations among 543 speakers– corpus divided into 6 splits and tested using jack-knifing
through the splits– manual transcripts provided by MS. State
Task:– 8 conversation sides used as training data to build models
for each target speaker– 1 conversation side used as test data– background model built from 3 splits of held-out data– jack-knifing allowed for almost 10,000 trials
![Page 18: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/18.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (18)
G. Doddington’s LM: Results
Notes:– these results are my own
attempt to replicate the original experiments
– SRI reported EER = 8.65% for this same experiment
![Page 19: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/19.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (19)
Adapted bigram models
Incentive:– adapting target models from a much larger
background model should yield better estimates of probabilities in the language models
Specifically:– use same 2000 bigram vocabulary– target probabilities are a mixture of training
probabilities and background probabilities– mixture weight is 2:1 target data:bkg. data
![Page 20: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/20.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (20)
Adapted bigram models: Results
Notes:– nearly identical performance
– combination of the 2 systems yields almost no improvement
– why isn’t the adapted version better?
![Page 21: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/21.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (21)
Can anything improve on 8.68?
Trigrams?– use same count threshold to make a list of the
top 700 trigrams (“a lot of”, “I don’t know” were among the most common)
Character models?– worked well for authorship…– included all character combinations (no limited
vocabulary)– tried bigram and trigram models
![Page 22: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/22.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (22)
Scores and combinationsadapt. word bigrams
EER = 8.89%adapt. word trigrams
EER = 11.88%adapt.char. bigrams
EER = 13.73%adapt. char. trigrams
EER = 17.92%
adapted wordsEER = 8.46%
adapted words + adapted charactersEER = 7.89%
adapted charactersEER = 13.24%
GD bigramsEER = 8.68%
![Page 23: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/23.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (23)
Final Comparison
![Page 24: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/24.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (24)
What about less training data?
1 conversation-side training– character models might provide more of an
advantage with less data?– not so.
• GD EER = 22.5%• adapted character EER = 30%• adapted word EER = 20%
– maybe these character models pick up on the topic of that 1 conversation
– haven’t tried any other size training data
![Page 25: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/25.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (25)
Outline
• Author identification
• Trying to beat GD’s result
• My next project
![Page 26: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/26.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (26)
Key-lists for speaker recognition
• key-list n-grams picked up on phrasing (comma and period were valuable tokens)– automatic transcripts don’t have punctuation
but they do have pause and duration information
• use reg. exps. and duration info. to capture idiosynchratic speaker phrasing
• capture other speech information in key-lists? (energy, f0, etc.)
![Page 27: Language modeling for speaker recognition Dan Gillick January 20, 2004.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d435503460f94a1e971/html5/thumbnails/27.jpg)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (27)
Acknowledgements
Thanks to:
Anand and Luciana at SRI for trying to help me replicate their results
Barbara for providing advice
Barry and Kofi for helping with computers and stuff
George