Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He...
Transcript of Intro to VXML Jim Larson - Brandeiscs136a/CS136a_docs/... · – Dr. Smith lives at 214 Elm Dr. He...
(c) 2007 Larson Technical Services 1
VoiceXML Overview James A. Larson Intel Corporation
(c) 2007 Larson Technical Services 2
Outline
• Motivation for VoiceXML • W3C Speech Interface Framework
Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • VoiceXML 2.1
(c) 2007 Larson Technical Services 3
VoiceXML in the Marketplace
• VoiceXML 2.0 is now ratified as a Recommendation (e.g., official standard) by the W3C
• Hundreds of millions of VoiceXML calls are answered every day
VoiceXML is the standard for building speech-enabled applications
(c) 2007 Larson Technical Services 4
Motivation for Speech Applications
• Users access Web sites from any telephone, anywhere, any time.
• Speaking and listening are the natural usage modes for phones.
(c) 2007 Larson Technical Services 5
Strength of VoiceXML Applications
• Traditional system-directed dialogs for novice users
• Mixed initiative dialogs for experienced users
• Novice users smoothly become experienced users at their own pace
(c) 2007 Larson Technical Services 6
Limitations of VoiceXML Applications • No special analysis of speech input
– Not suitable for training speech skills—Reading, ESL, singing, etc.
• VUI conversational bandwidth is slower than GUI conversational bandwidth – Using a VUI is like drinking from Lake
Superior with a straw
(c) 2007 Larson Technical Services 7
Exercise 1
• Name or describe a speech application you could use at work.
• Name or describe a speech application you or family member can use at home.
(c) 2007 Larson Technical Services 8
XML • XML = eXtensible Markup Language • Elements are surrounded by tags
<prompt>Welcome to the voice system </prompt> • Elements may be nested
<prompt> Welcome to Ajax Travel <break/>
we have the cheapest fares </prompt>
• Elements may have attributes <choice next="#boat"> <grammar type="application/grammar+xml" version="1.0"
root = "by_boat" src = “boat.grxml”> • Because “<”, “>”, and “&” have special meanings
“<” in place of “<” “>” in place of “>” “&” in place of “&”.
(c) 2007 Larson Technical Services 9
Outline
• Motivation for VoiceXML • W3C Speech Interface Framework
Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • VoiceXML 2.1
(c) 2007 Larson Technical Services 10
DB
Multimedia Files
Audio Files
Web Server
HTML Scripts
VoiceXML Scripts
Grammars
Speech Server/Gateway
Web Browser
Capture Voice ASR
DTMF Replay Audio
TTS
Database Server
Voice Browser
Documents
(c) 2007 Larson Technical Services 11
W3C Speech Interface Framework
Speech Synthesis
Grammar Other
VoiceXML 2.0
Call Control
Semantic Interpretation
(c) 2007 Larson Technical Services 12
Status of W3C Speech Interface Languages
Voice XML 2.0
Grammar (SRGS)
Synthesis (SSML)
Call Control
(CCXML)
Semantic Interpret-
Ration (SISR)
Recommendation
Proposed Recommendation
Candidate Recommendation
Last Call Working Draft
Requirements
Working Draft
Voice XML 2.1
V3
(c) 2007 Larson Technical Services 13
Outline
• Motivation for VoiceXML • W3C Speech Interface Framework
Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • VoiceXML 2.1
(c) 2007 Larson Technical Services 14
Example of VoiceXML 2.0 Fragment <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account"> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode =
"voice"> <rule id = “account_type"> <one-of> <item> savings </item>
<item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item>
</one-of> </rule>
</grammar> </field> …. </form> … </vxml>
Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)
(c) 2007 Larson Technical Services 15
Example of VoiceXML 2.0 Fragment <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account"> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode =
"voice"> <rule id = “account_type"> <one-of> <item> savings </item>
<item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item>
</one-of> </rule>
</grammar> </field> …. </form> … </vxml>
Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)
(c) 2007 Larson Technical Services 16
Example of VoiceXML 2.0 Fragment <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account"> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">
<rule id = “account_type"> <one-of> <item> savings </item>
<item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item>
</one-of> </rule>
</grammar> </field> …. </form> … </vxml>
Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)
(c) 2007 Larson Technical Services 17
Example of VoiceXML 2.0 Fragment <?xml version="1.0"?> <vxml version="2.0"> <form> … <field name = "account"> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">
<rule id = “account_type"> <one-of> <item> savings </item>
<item> checking </item> <item> CD </item> <item> certificate of deposit <tag>new.account = “CD”<tag> </item>
</one-of> </rule>
</grammar> </field> …. </form> … </vxml>
Dialog Language (VocieXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)
(c) 2007 Larson Technical Services 18
VoiceXML 2.0 features • Menus, forms, sub-dialogs
– <menu>, <form>, <subdialog>
• Inputs – Speech recognition <grammar> – Recording <record> – Keypad <grammar mode=“dtmf”>
• Output – Audio files <audio> – Text-to-speech <prompt>
• Variables – <var> <script> <assign>
• Events – <nomatch>, <noinput>, <help>,
<catch>, <throw> • Transition and submission
– <goto>, <submit> – Telephony
– Connection control – <transfer>, <disconnect>
– Telephony information – Platform
– Objects – Performance
– Fetch
(c) 2007 Larson Technical Services 19
Typical Form Fill-In <form> <block> <prompt>Welcome to the electronic payment system.</prompt> </block> <field name="card_number">
<prompt> Please enter your credit card number? </prompt> <grammar src=“http://www.ajax.com/credit_card_number.grxml"/>
</field> <field name="date">
<prompt>Please enter your expiration date </prompt> <grammar src=“http://www.ajax.com/credit_card_date.grxml"/>
</field> </form>
(c) 2007 Larson Technical Services 20
Exercise 2 Capture “birth date”
<form> <block> <prompt> _____________________ </prompt> </block> <field name = "month">
<prompt> _______________________________</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>
</field> <field name = "day">
<prompt> ______________________________ </prompt> <grammar src=“http://www.ajax.com/day.grxml"/>
</field> <field name = "year"> <prompt> ______________________________ </prompt>
<grammar src=“http://www.ajax.com/year.grxml"/> </field> </form>
(c) 2007 Larson Technical Services 21
Event Handlers • Deal with exceptional or error conditions • Control mechanism for dialog turn retries
– <catch event=“noinput”> … </catch> – <catch event=“nomatch” … </catch> – <catch event=“help”> … </catch>
• Shorthand notation available – <noinput> … </noinput>, etc.
• Scoped according to where they occur – <form>, <field>, etc.
(c) 2007 Larson Technical Services 22
Adding Event Handlers
<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>
<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>
</field> ….. </form>
(c) 2007 Larson Technical Services 23
Adding Event Handlers
<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>
<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>
</field> ….. </form>
(c) 2007 Larson Technical Services 24
Adding Event Handlers
<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>
<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>
</field> ….. </form>
(c) 2007 Larson Technical Services 25
Default Event Handlers
<catch event = "help">
<prompt> Sorry, no help is available. </prompt> </catch>
<catch event = "nomatch"> <prompt> I did not understand, please try again </prompt> </catch>
<catch event = "noinput"> <prompt> I did not hear anything, please speak again </prompt> </catch>
(c) 2007 Larson Technical Services 26
Exercise 3 Write event handlers for the month field
<catch event = "help">
<prompt> ____________________ </prompt> </catch>
<catch event = "nomatch"> <prompt> __________________________ </prompt> </catch>
<catch event = "noinput"> <prompt> ___________________________________ </prompt> </catch>
(c) 2007 Larson Technical Services 27
Outline
• Motivation for VoiceXML • W3C Speech Interface Framework
Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • VoiceXML 2.1
(c) 2007 Larson Technical Services 28
Speech Synthesis ML
Structure Analysis
Text Normali-
zation
Text-to- Phoneme
Conversion
Prosody Analysis
Waveform Production
Markup support: p, s Non-markup behavior: infer structure by automated text analysis
(c) 2007 Larson Technical Services 29
Before and after Structure Analysis • Before structure analysis
– Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass.
• After structure analysis <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught
a 19 lb. bass. </s> </p>
<p> <s> Dr. Smith lives at 214 Elm Dr. </s> <s> He weights 214 lb. </s>
(c) 2007 Larson Technical Services 30
Speech Synthesis ML
Structure Analysis
Text Normali-
zation
Text-to- Phoneme
Conversion
Prosody Analysis
Waveform Production
Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs
Markup support: p, s Non-markup behavior: infer structure by automated text analysis
(c) 2007 Larson Technical Services 31
After Text Normalization <p> <s> <sub alias= "doctor">Dr. </sub> Smith lives at 214 Elm <sub alias = "drive">Dr. </sub> </s> <s> He weights 214<sub alias= "pounds"> lb. </sub> </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a 19 <sub alias= "pound"> lb. </sub> bass. </s> </p>
(c) 2007 Larson Technical Services 32
Speech Synthesis ML
Structure Analysis
Text Normali-
zation
Text-to- Phoneme
Conversion
Prosody Analysis
Waveform Production
Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary
Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs
Markup support: p, s Non-markup behavior: infer structure by automated text analysis
(c) 2007 Larson Technical Services 33
After text-to-phoneme conversion <p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm <sub alias = "drive">Dr. </sub> </s> <s> He weighs <sayas interpret-as = “number”>214 </sayas> <sub alias= "pounds"> lb.</sub> </s> <s> He plays <phoneme alphabet = “IPA" ph="b@s">bass</phoneme> guitar. </s> <s> He also likes to fish; last week he caught a
<sayas interpret-as= “number">19 </sayas> <sub alias= "pound"> lb. </sub> <phoneme alphabet = “IPA" ph="bas">bass</phoneme>. </s> </p>
(c) 2007 Larson Technical Services 34
Speech Synthesis ML
Structure Analysis
Text Normali-
zation
Text-to- Phoneme
Conversion
Prosody Analysis
Waveform Production
Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax
Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary
Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs
Markup support: p, s Non-markup behavior: infer structure by automated text analysis
(c) 2007 Larson Technical Services 35
Prosody Analysis (Initial text)
<prompt> Environmental control menu. Do you want to
adjust the lighting or temperature? </prompt>
(c) 2007 Larson Technical Services 36
Prosody Analysis
<prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature? </emphasis> </prompt>
(c) 2007 Larson Technical Services 37
Speech Synthesis ML
Structure Analysis
Text Normali-
zation
Text-to- Phoneme
Conversion
Prosody Analysis
Waveform Production
Markup support: voice, audio*
Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax
Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary
Markup support: say-as for dates, times, etc. sub for aliasing Non-markup behavior: automatically identify and convert constructs
Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis
*audio icons, branding, advertising
(c) 2007 Larson Technical Services 38
Wave Form Production
<prompt> <audio src=“http://www.example.com/adjust.wav" > <desc>
Environmental control menu. Do you want to adjust the lighting or temperature
</desc> </audio> </prompt>
(c) 2007 Larson Technical Services 39
Exercise 4 (insert SSML commands)
<prompt> Welcome to Ajax Bank do you want to
withdraw or deposit funds? </prompt>
(c) 2007 Larson Technical Services 40
Outline
• Motivation for VoiceXML • W3C Speech Interface Framework
Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • VoiceXML 2.1
(c) 2007 Larson Technical Services 41
Grammars
• Describe what the user may say at a point in the dialog
• Enable the speech recognition engine to work faster and more accurately
• Consist of one or more “rules”
(c) 2007 Larson Technical Services 42
Example Grammar <grammar
type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">
<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>
<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>
XML form of grammars
(c) 2007 Larson Technical Services 43
Example Grammar <grammar
type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">
<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>
<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>
Grammar processor should start with the “zero_to_ten” rule
(c) 2007 Larson Technical Services 44
Example Grammar <grammar
type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">
<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>
<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>
This is a grammar used by the speech recognizer. (There may
also be grammars for DTMF recognizers.)
(c) 2007 Larson Technical Services 45
Example Grammar <grammar
type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">
<rule id = "zero_to_ten"> <one-of>
<item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>
<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>
Rule describing single digits
Rule describing digits one through ten
(c) 2007 Larson Technical Services 46
Example Grammar <grammar
type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">
<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>
<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>
<one-of> describes alternatives
(c) 2007 Larson Technical Services 47
Example Grammar <grammar
type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">
<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>
<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar> Rule element references another
rule
(c) 2007 Larson Technical Services 48
Example Grammar <grammar
type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">
<rule id = "zero_to_ten"> <one-of> <item> zero </item> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of> </rule>
<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar> Exercise 5:
Write a grammar for that recognizes the digits zero to nineteen
(c) 2007 Larson Technical Services 49
More Grammar Elements • Repeat and optional
<rule id = "goodness" scope = "public"> <item repeat = "0-3" > very </item>
good </rule>
• Sequence <rule id = "twenty_thru_twentynine“>
Twenty <ruleref uri = "#single_digit"/> </rule>
• Garbage <rule name = "James_Lewis">
<item> James <ruleref special = “garbage"/> Lewis </item> </rule>
(c) 2007 Larson Technical Services 50
Reusing existing grammars
<grammar type = "application/srgs+xml"
root = "size” src = “http://www.example.com/size.grxml"/>
(c) 2007 Larson Technical Services 51
Outline
• Motivation for VoiceXML • W3C Speech Interface Framework
Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • VoiceXML 2.1
(c) 2007 Larson Technical Services 52
Semantic Interpretation
• Semantic Interpretation defines how to extract and modify the results returned by the speech recognition engine
• Semantic interpretation instructions contained in the <tag> element
• Two kinds of syntax for <tag> contents: – Semantic Literals (literal values) – Semantic Scripts (ECMAScript)
(c) 2007 Larson Technical Services 53
Semantic Interpretation
• Semantic Literals example:
<rule id=“drink“> <one-of> <item> coca cola <tag> coke </tag> </item> <item> cola <tag> coke </tag> </item> <item> black fizzy stuff <tag> coke </tag> </item>
<item> coke </item> </one-of> </rule>
(c) 2007 Larson Technical Services 54
Semantic Interpretation
• Semantic Literals example:
<rule id=“drink“> <one-of> <item> coca cola <tag> coke </tag> </item> <item> cola <tag> coke </tag> </item> <item> black fizzy stuff <tag>coke </tag> </item>
<item> coke </item> Default Assignment </one-of> </rule>
(c) 2007 Larson Technical Services 55
No Semantic Scripts
ASR
Grammar with Semantic
Interpretation Scripts
Semantic Interpretation
Processor
VoiceXML Interpreter
text
ECMAScript object
fourteen
(c) 2007 Larson Technical Services 56
No Semantic Interpretation
ASR
Grammar with Semantic
Interpretation Scripts
VoiceXML Interpreter
text
fourteen
fourteen
ECMAScript object
Semantic Interpretation
Processor
(c) 2007 Larson Technical Services 57
Semantic Interpretation
ASR
Grammar with Semantic
Interpretation Scripts
VoiceXML Interpreter
text
fourteen
<item> fourteen <tag>new.quantity=“14”;</tag> </item>
ECMAScript object
Semantic Interpretation
Processor
(c) 2007 Larson Technical Services 58
Semantic Interpretation
ASR
Grammar with Semantic
Interpretation Scripts
VoiceXML Interpreter
text
fourteen fourteen
{ quantity: “14” }
<item> fourteen <tag>new.quantity=“14”;</tag> </item>
ECMAScript object
Semantic Interpretation
Processor
(c) 2007 Larson Technical Services 59
Semantic Interpretation
• Semantic Scripts employ ECMAScript
• Advantages: • Richer structure (objects) • Ability to perform computations
(c) 2007 Larson Technical Services 60
Semantic Interpretation • Example grammar rule with Script Syntax:
<rule id = "action"> <one-of>
<item> small <tag> out.size = "small"; </tag> </item> <item> medium <tag> out.size = "medium"; </tag> </item> <item> large <tag> out.size = “large"; </tag> </item> </one-of> <one-of> <item> green <tag> out.color = "green"; </tag> </item> <item> blue <tag> out.color = "blue"; </tag> </item> <item> white <tag> out.color = "white"; </tag> </item> </one-of> </rule>
• ECMAScript structure:
action: { size: "large" color: "white" }
Large white
(c) 2007 Larson Technical Services 61
Semantic Interpretation • Example grammar rule with Script Syntax:
<rule id="calculator"> What is <ruleref uri="#digit"/><tag>$.total = $digit;</tag>
<item repeat="1-"> plus <ruleref uri="#digit"/> <tag> $.total = $.total + $digit; </tag> </item> </rule>
• ECMAScript structure:
calculator: { total: 6 }
What is 1+ 2+ 3?
(c) 2007 Larson Technical Services 62
Exercise 6 Fill in the contents of <tag>
• Grammar rule:
<rule id = “transfer"> from
<one-of> <item> savings <tag>________________________ </tag> </item> <item> checking <tag>________________________</tag> </item> </one-of>
to <one-of> <item> savings <tag>________________________</tag> </item>
<item> checking <tag>________________________</tag> </item> </one-of> </rule>
• ECMAScript structure:
transfer: { source_account: "savings" target_account: “checking" }
From savings to checking
(c) 2007 Larson Technical Services 63
Outline
• Motivation for VoiceXML • W3C Speech Interface Framework
Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • VoiceXML 2.1
(c) 2007 Larson Technical Services 64
VoiceXML 2.1
• VoiceXML’s success and popularity resulted in many implementations early in the standardization process
• Additional, innovative features were conceived after VoiceXML 2.0 content was agreed
• Goals of VoiceXML 2.1: – Ensure portability by specifying a set of commonly
implemented extensions – Backwards-compatible with VoiceXML 2.0 – Follow a “fast track” to standardization
(c) 2007 Larson Technical Services 65
VoiceXML 2.1
• Standardized extensions: – Locate barge-in occurrences within prompts – Access recognition utterances for analysis – Increase performance be reducing server
round-trips – Extended call transfer types
(c) 2007 Larson Technical Services 66
Summary
• W3C Speech Interface Framework – Dialog—VoiceXML – Grammar—SRGS – Synthesis—SSML – Semantic Interpretation—SI – Call Control—CCXML
• Can work together or separately • See http://www.w3.org/voice/ for details
(c) 2007 Larson Technical Services 67
Industry Organizations
• World Wide Web Consortium – http://www.w3.org
• W3C Voice Browser Working Group – http://www.w3.org/voice/
• W3C Multi-Modal Working Group – http://www.w3.org/2002/mmi/
• VoiceXML Forum – http://www.voicexml.org
• SALT Forum: – http://www.saltforum.org
• Speech Technology Magazine – http://www.amcommexpos.com/
(c) 2007 Larson Technical Services 68
Books • James A. Larson, VoiceXML—An Introduction
to Developing Speech Applications, 2002, Upper Saddle River, NJ: Prentice Hall. • Eve Astrid Andersson, et.al., Early Adopter Voice, 2001, Birmingham
UK: Vrox. • Bruce Balentine & David P. Morgan, How to Build a Speech
Recognition Application: A Style Guide for Telephony Dialogues, 1999, San Ramon, CA: Enterprise Integration Group.
• Rick Beasley et. al., Voice Application Development with Voice, 2002, Indianapolis: Sams.
• Bob Edgar, The Voice Handbook, 2001, New York: CMP. • Susan Weinschenk & Dean T. Barker, Designing Effective Speech
Interfaces, 2000, New York: John Wiley & Sons. • Chetan Sharma & Jeff Kunins, Voice: Strategies and Techniques for
Effective Voice Application Development with Voice 2.0, 2002, New York: John Wiley.
• Michael H. Cohen, James P. Giangola, & Jennifer Balogh, Voice User Interface Design, 2004, Addison Wesley.
(c) 2007 Larson Technical Services 69
Other Resources
• The VoiceXML Guide – http://www.vxmlguide.com/
(c) 2007 Larson Technical Services 70
Tutorials and Articles
• VoiceXML Forum – http://www.voicexmlforum.org/
• VoiceXML Review – http://www.voicexmlreview.org/
• World of VoiceXML – http://www.kenrehor.com/voicexml/
(c) 2007 Larson Technical Services 71
Online Voice SDKs Name URL BeVocal Cafe http://cafe.bevocal.com Tellme Studio http://studio.tellme.com VoiceGenie Developer Workshop http://developer.voicegenie.com
Voxpilot voxbuilder http://www.voxbuilder.com
(c) 2007 Larson Technical Services 72
Questions?
?
(c) 2007 Larson Technical Services 73
Thanks for your attention
(c) 2007 Larson Technical Services 74
Answer to Exercise 2
<form> <prompt> When were you born? </prompt> <field name = "month">
<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>
</field> <field name = "day">
<prompt> What day of the month? </prompt> <grammar src=“http://www.ajax.com/day.grxml"/>
</field> <field name = "year"> <prompt> What year </prompt>
<grammar src=“http://www.ajax.com/year.grxml"/> </field> </form>
(c) 2007 Larson Technical Services 75
Answer to Exercise 3 Write event handlers for the month field
<catch event = "help">
<prompt> In what month were you born? </prompt> </catch>
<catch event = "nomatch"> <prompt> Which month, for example, January February, or March? </prompt> </catch>
<catch event = "noinput"> <prompt> Say the name of the month you were born in </prompt> </catch>
(c) 2007 Larson Technical Services 76
Answer to Exercise 4
<prompt> Welcome to Ajax Bank <break/> <emphasis level = "reduced " > do you want to </emphasis> <emphasis level = "strong"> withdraw </emphasis> <break/> or <emphasis level = "strong">deposit </emphasis> funds? </prompt>
(c) 2007 Larson Technical Services 77
Answer to Exercise 5 Write a grammar for zero to nineteen
<grammar type = "application/srgs+xml" root = "zero_to_19" mode = "voice">
<rule id = "zero_to_19"> <one-of> <item> zero </item>
<ruleref uri = "#single_digit"/> <item> ten </item> <item> eleven </item> <item> twelve </item> <item> thirteen </item> <item> fourteen </item> <item> fifteen </item> <item> sixteen </item> <item> seventeen </item> <item> eighteen </item> <item> nineteen </item>
</one-of> </rule>
<rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar>
(c) 2007 Larson Technical Services 78
Answer to Exercise 6 From savings to
checking
• Grammar rule:
<rule id = “transfer"> from
<one-of> <item> savings <tag> out.source_account = “savings"; </tag> </item> <item> checking <tag> out.source_account = “checking"; </tag> </item> </one-of>
to <one-of> <item> savings <tag> out.target_account = “savings"; </tag> </item>
<item> checking <tag> out.target_account = “checking"; </tag> </item> </one-of> </rule>
• ECMAScript structure:
transfer: { source_account: "savings" target_account: “checking" }