Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin...

28
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan Jing (IBM) Jiahong Yuan (Cornell)

Transcript of Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin...

Page 1: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Connecting Acoustics to Linguistics in Chinese Intonation

Greg Kochanski (Oxford Phonetics)

Chilin Shih (University of Illinois)

Tan Lee (CUHK)

withHongyan Jing (IBM)

Jiahong Yuan (Cornell)

Page 2: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Questions• Can we usefully include biomechanics into a phonetics

model?• Can we objectively assign an importance to a syllable?

• Can we write a unified description of F0 for both tone and accent languages?

GoalBuild a mathematical model that

takes a sequence of discrete symbols as inputand

produces a quantitative prediction for f0.

Page 3: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

TheChallenge

Page 4: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Existing work

Rising?

Page 5: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Basic assumptions used in modeling

• People plan their utterances several syllables in advance.

• People produce speech optimized to communicate with minimal effort.

• A realistic model for the muscles that control f0

Page 6: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Realistic model of muscle control for F0

• We’d like a model of prosody that can apply beyond F0.

Page 7: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

People talk nearly as fast as possible.

Page 8: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Speech could be optimal

•Most of what we say is made from bits and pieces we’ve said before.

•There are only 4 (Mandarin) or 6 (Cantonese) tones to combine.

•A speaker has the chance to practice and optimize all the common 3- and 4- tone sequences.

Page 9: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Optimize what?

• People want to minimize effort and/or talk faster– Chairs, Cars

• People want to minimize the chance that they will be misunderstood.– Risk = P(misinterpreted) * cost(misinterpreted)

Minimize: Effort + cost*Error– We allow each syllable to have a different weight,

so error is a sum over syllables or words.– Perhaps cost matches importance.

Page 10: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Effort and Error

22222 pppdtG

How does Effort depend on the form of the pitch curve?

Error = mean-squared deviation between the f0

and the templates.

Page 11: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Model behavior

• For cost>>1, Error dominates, and pitch matches target.

• For cost<<1, Effort dominates, both speaker and listener accept large deviations, and pitch smoothly interpolates.

• For cost~1, everything compromises.

Cost plays the role of a prosodic strength.

Page 12: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Another Challenge

Time (10 ms intervals)

F0 (

Hz)

12

34

Tone shapes

Page 13: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

The rest of the model.

• A model is a sequence of targets (used to compute the Error terms).

• Each target has a strength (i.e. the cost of misinterpretation).

• One target per tone.

• Targets are stretched to fit syllable duration.

• Only one phonological rule: 3323

Page 14: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Model fits for Mandarin Chinese

Tone class (input)Strength (result)

Inside a word, strength is distributed by the metrical

pattern

Page 15: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

What’s the procedure?

Compute the pitch curve as a function of phonological inputs

and prosodic strength.

Sequence of tones (phonology)

Prosodic strengths

Predicted F0

Data

Nonlinear least-squares fitting algorithm

Page 16: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Model fits to Mandarin Chinese

0.61 free parameters per syllable, 13 Hz RMS error.

Page 17: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Strengths are stable under small changes in the model.

The two models have words defined by different labelers

This model allows extra freedom: different tones are allowed to define their targets differently

This model allows less freedom: all tones have the same type of target.

Page 18: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Model parameters

Mandarin

Cantonese

Phrasing is marked in speech.

Cantonese data courtesy of Prof. Tan Lee

Page 19: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Metrical patterns inside words

Mandarin

“Normal” segmentation of characters into words.

Random segmentation of characters into words.

Lexical acquisition

Page 20: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Other nice properties

•Strengths are correlated with duration:

•(duration is a proxy for prominence)

•r = 0.40 (sentence final)

•r = 0.27 (non-final)

•>95% confidence

•Strength is correlated with mutual information of neighboring syllables:

•r = -0.175

•>95% confidence

•Sloppy when generating unsurprising syllables, and precise for surprising syllables.

Page 21: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Local Conclusion

• Intonation can be represented as:– a small set of discrete symbols, in sequence, with– a per-person or per-style shape for each symbol;– modulated by a variable prosodic strength.

• One symbol per syllable seems enough

• The strength parameter seems real– Similar across languages– Matches language structure

Page 22: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Q: But does it work for English?

A: Yes, under circumstances where the intonational phonology is simple enough to be obvious.

Page 23: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Reminder: Limitations of f0 and complexity of prosody.

To show the range of information that can be carried by prosody, observe an elegant experiment by Stan Freberg (1950):

The text has virtually no lexical information, but it still tells a story. Even so, it is very hard to label individual words.

Page 24: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

English

•Sentences in the form “123-456-7890?”

•Speaker is trying to confirm a single digit.

•Models have just 1.1 parameter per sentence.

Page 25: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

The model for English

•There are identical boundary tones on every utterance.

•All target shapes are identical, except the focus.

%X B B B | B A B | B B B B Y%

%X B B B | A B B | B B B B Y%

%X B A B | B B B | B B B B Y%

•Rather simple phonology.

•Accent prominence depends on position in phrase and in utterance.

Page 26: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Model fits well over a range of speeds.Suppressed phrasing

Lowspeed

Highspeed

Merger of accent with boundary tone

Page 27: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Model reproduces nontrivial features of the data and fits well over a range of speeds.

Suppressed phrasing

Lowspeed

Highspeed

Merger of accent with boundary tone

Page 28: Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

Conclusion

•Physiologically-based models can capture important aspects of speech.

•A very compact representation of behavior.

•It can be applied broadly:

•Two dialects of Chinese

•Some aspects of English

•It raises questions about where the phonetics/phonology boundary actually sits.

•Introduces an objective acoustic measure of prosodic prominence.

•Suggests that the speaker may help the listener segment the speech stream.