Prosody in Generation
-
Upload
kasper-blair -
Category
Documents
-
view
28 -
download
3
description
Transcript of Prosody in Generation
JH 04/19/23 1
Prosody in GenerationProsody in Generation
JH 04/19/23 2
Natural Language Natural Language Generation (NLG)Generation (NLG)
• Typical NLG system does Text planning transforms communicative goal into
sequence or structure of elementary goals Sentence planning chooses linguistic resources to
achieve those goals Realization produces surface output
JH 04/19/23 3
Research Directions in NLGResearch Directions in NLG
• Past focus Hand-crafted rules inspired by small corpora Very little evaluation Monologue text generation
• New directions Large-scale corpus-based learning of system
components Evaluation important but how to do it still
unclear Spoken monologue and dialogue
04/19/234
AT&T Labs AT&T Labs ResearchResearch
How to produce speech instead How to produce speech instead of text?of text?
JH 04/19/23 5
OverviewOverview
• Spoken NLG in Dialogue Systems• Text-to-Speech (TTS) vs. Concept-
to-Speech (CTS)• Current Approaches to CTS
Hand-built systems Corpus-based systems
• NLG Evaluation• Open Questions
JH 04/19/23 6
Importance of NLG in Importance of NLG in Dialogue SystemsDialogue Systems
• Conveying information intonationally for conciseness and naturalness System turns in dialogue systems can be
shorterS: Did you say you want to go to Boston?S: (You want to go to) Boston H-H%
• Not providing mis-information through misleading prosody...S: (You want to go to) Boston L-L%
JH 04/19/23 7
• Silverman et al ‘93: Mimicking human prosody improves transcription accuracy in reverse telephone directory task
• Sanderman & Collier ‘97Subjects were quicker to respond to ‘appropriately phrased’ ambiguous responses to questions in a monitoring task
Q: How did I reserve a room? vs. Which facility did the hotel have?
A: I reserved a room L-H% in the hotel with the fax.
A: I reserved a room in the hotel L-H% with the fax.
JH 04/19/23 8
OverviewOverview
• Spoken NLG in Dialogue Systems• Text-to-Speech (TTS) vs. Concept-
to-Speech (CTS)• Current Approaches to CTS
Hand-built systems Corpus-based systems
• NLG Evaluation• Open Questions
JH 04/19/23 9
Prosodic Generation for TTSProsodic Generation for TTS
• Default prosodic assignment from simple text analysis
• Hand-built rule-based system: hard to modify and adapt to new domains
• Corpus-based approaches (Sproat et al ’92) Train prosodic variation on large labeled
corpora using machine learning techniques Accent and phrasing decisions Associate prosodic labels with simple features
of transcripts
JH 04/19/23 10
• # of words in phrase
• distance from beginning or end of phrase
• orthography: punctuation, paragraphing
• part of speech, constituent information
Apply learned rules to new text
• Incremental improvements continue: Adding higher-accuracy parsing (Koehn et al ‘00)
• Collins ‘99 parser
• More sophisticated learning algorithms (Schapire & Singer ‘00)
• Better representations: tree based?
• Rules always impoverished• How to define Gold Standard?
JH 04/19/23 11
Spoken NLGSpoken NLG
• Decisions in Text-to-Speech (TTS) depend on syntax, information status, topic structure,… information explicitly available to NLG
• Concept-to-Speech (CTS) systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how
• But….generating prosody for CTS isn’t so easy
JH 04/19/23 12
OverviewOverview
• Spoken NLG in Dialogue Systems• Text-to-Speech (TTS) vs. Concept-
to-Speech (CTS)• Current approaches to CTS
Hand-built systems Corpus-based systems
• NLG evaluation• Open questions
JH 04/19/23 13
Relying upon Prior ResearchRelying upon Prior Research
• MIMIC CTS (Nakatani & Chu-Carroll ‘00) Use domain attribute/value distinction to drive
phrasing and accent: critical information focussedMovie: October Sky
Theatre: Hoboken Theatre
Town: Hoboken• Attribute names and values always accented• Values set off by phrase boundaries
Information status conveyed by varying accent type (Pierrehumbert & Hirschberg ‘90)• Old (given) L*• Inferrable (by MIMIC, e.g. theatre name from town) L*+H
JH 04/19/23 14
• Key (to formulating valid query) L+H*• New H*
Marking Dialogue Acts• NotifyFailure:
U: Where is “The Corrupter” playing in Cranford.S: “The Corrupter”[L+H*] is not [L+H*] playing in Cranford
[L*+H].• Other rules for logical connectives, clarification and
confirmation subdialogues
• Contrastive accent for semantic parallelism (Rooth ‘92, Pulman ‘97) used in GoalGetter and OVIS (Theune ‘99)
The cat eats fish. The dog eats meat.
JH 04/19/23 15
But … many But … many counterexamplescounterexamples
• Association of prosody with many syntactic, semantic, and pragmatic concepts still an open question
• Prosody generation from (past) observed regularities and assumptions: Information can be ‘chunked’ usefully by
phrasing for easier user understanding• But in many different ways
Information status can be conveyed by accent:• Contrastive information is accented?S: You want to go to L+H* Nijmegen, L+H* not
Eindhoven.
JH 04/19/23 16
Given information is deaccented? Speaker/hearer givenness
U: I want to go to Nijmegen.
S: You want to go to H* Nijmegen?
Intonational contours can convey speech acts, speaker beliefs:• Continuation rise can maintain the floor?
S: I am going to get you the train information [L-H%]. Backchanneling can be produced
appropriately?
S: Okay. Okay? Okaaay… Mhmm..
JH 04/19/23 17
Wh and yes-no questions can be signaled appropriately?
S: Where do you want to go.
S: What is your passport number? Discourse/topic structure can be signaled by
varying pitch range, pausal duration, rate?
JH 04/19/23 18
OverviewOverview
• Spoken NLG in Dialogue Systems• Text-to-Speech (TTS) vs. Concept-
to-Speech (CTS)• Current Approaches to CTS
Hand-built systems Corpus-based systems
• NLG Evaluation• Open Questions
JH 04/19/23 19
MAGICMAGIC
• MM system for presenting cardiac patient data Developed at Columbia by McKeown and colleagues in
conjunction with Columbia Presbyterian Medical Center to automate post-operative status reporting for bypass patients
Uses mostly traditional NLG hand-developed components Generate text, then annotate prosodically Corpus-trained prosodic assignment component
• Corpus: written and oral patient reports 50min multi-speaker, spontaneous + 11min single speaker,
read 1.24M word text corpus of discharge summaries
JH 04/19/23 20
Transcribed, ToBI labeled Generator features labeled/extracted:
• syntactic function• p.o.s.• semantic category• semantic ‘informativeness’ (rarity in corpus)• semantic constituent boundary location and length• salience• given/new• focus• theme/ rheme• ‘importance’• ‘unexpectedness’
JH 04/19/23 21
Very hard to label features
• Results: new features to specify TTS prosody Of CTS-specific features only semantic
informativeness (likeliness of occuring in a corpus) useful so far (Pan & McKeown ‘99)
Looking at context, word collocation for accent placement helps predict accent (Pan & Hirschberg ‘00)RED CELL (less predictable) vs. BLOOD cell (more)Most predictable words are accented less frequently (40-
46%) and least predictable more (73-80%)Unigram+bigram model predicts accent status w/77% (+/-.51)
accuracy
JH 04/19/23 22
Stochastic, Corpus-based NLGStochastic, Corpus-based NLG
• Generate from a corpus rather than hand-built system For MT task, Langkilde & Knight ‘98 over-
generate from traditional hand-built grammar Output composed into lattice Linear (bigram) language model chooses best
path
• But … no guarantee of grammaticality How to evaluate/improve results? How to incorporate prosody into this kind of
generation model?
JH 04/19/23 23
FERGUS (Bangalore & FERGUS (Bangalore & Rambow ‘00)Rambow ‘00)
• Corpus-based learning to refine syntactic, lexical and prosodic choice
• Domain is DARPA Communicator task (air travel information)
• Uses stochastic tree model + linear LM + XTAG (hand-crafted) grammar
• Trained on WSJ dependency trees tagged with p.o.s., morphological information, syntactic SuperTags (grammatical function, subcat frame, arg realization), WordNet sense tags and prosodic labels (accent and boundary)
JH 04/19/23 24
• Input: Dependency tree of lexemes Any feature can be specified, e.g. syntactic, prosodic
controlcontrol
poachers poachers <L+H*><L+H*> nownow tradetrade
thethe undergroundunderground
JH 04/19/23 25
• Tree Chooser: Selects syntactic/prosodic properties for input nodes based match
with features of mothers and daughters in corpus
controlcontrol
poacherspoachers<L+H*><L+H*> nownow tradetrade
thethe undergroundunderground
JH 04/19/23 26
• Unraveler: Produces lattice of all syntactically possible
linearizations of tree using XTAG grammar
controlcontrol
poacherspoachers nownow tradetrade
thethe
undergroundunderground
poacherspoachersnownow
tradetrade
undergroundunderground
JH 04/19/23 27
• Linear Precedence Chooser: Finds most likely lattice traversal, using trigram
language model
Now [H*] poachers [L+H*] [L-] control the underground trade [H*] [L-L%].
• Many ways to implement each step How to choose which works ‘best’? How to evaluate output?
JH 04/19/23 28
OverviewOverview
• Spoken NLG in Dialogue Systems• Text-to-Speech (TTS) vs. Concept-
to-Speech (CTS)• Current Approaches to CTS
Hand-built systems Corpus-based systems
• NLG Evaluation• Open Questions
JH 04/19/23 29
Evaluating NLGEvaluating NLG
• How to judge success/progress in NLG an open question Qualitative measures: preference Quantitative measures:
• task performance measures: speed, accuracy• automatic comparison to a reference corpus (e.g. string
edit-distance and variants, tree-similarity-based metrics)
Not always a single “best” solution
• Critical for stochastic systems to combine qualitative judgments with quantitative measures (Walker et al ’97)
JH 04/19/23 30
Qualitative Validation of Qualitative Validation of Quantitative MetricsQuantitative Metrics
• Subjects judged understandability and quality Candidates proposed by 4 evaluation metrics
to minimize distance from Gold Standard (Bangalore, Rambow & Whittaker ‘00)
Tree-based metrics correlate significantly with understandability and quality judgments -- string metrics do not
New objective metrics learned• Understandability accuracy = (1.31*simple tree
accuracy -.10*substitutions=.44)/.87• Quality accuracy = (1.02*simple tree accuracy
- .08*substitutions - .35)/.67
JH 04/19/23 31
OverviewOverview
• Spoken NLG in Dialogue Systems• Text-to-Speech (TTS) vs. Concept-
to-Speech (CTS)• Current Approaches to CTS
Hand-built systems Corpus-based systems
• NLG Evaluation• Open Questions
JH 04/19/23 32
More Open Questions for More Open Questions for Spoken NLGSpoken NLG
• How much to model human original?• Planning for appropriate intonational
variation even important in recorded prompts• Timing and backchanneling• What kind of output is most
comprehensible?• What kind of output elicits most easily
understood user response? (Gustafson et al ’97,Clark & Brennan ‘99)
• Implementing variations in dialogue strategy Implicit confirmation Mixed initiative