Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
-
Upload
jason-townsend -
Category
Technology
-
view
1.850 -
download
0
description
Transcript of Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
![Page 1: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/1.jpg)
Creating a Voice User Interface with Speech Server 2007Jason Townsend
![Page 2: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/2.jpg)
Jason Townsend
•President, Bartlesville .NET User Group•Sr. Analyst, ConocoPhillips•11+ Years Development Experience•Father of 4 wonderful children•Married to an amazing and forgiving wife!•Avid Sailor
![Page 3: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/3.jpg)
![Page 4: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/4.jpg)
Speech Server 2007
•Speech Server is an IVR (interactive voice response) platform that allows you to develop telephony applications using standards such as Speech Application Language Tags (SALT) and VoiceXML.
•New Features▫Native Voice Over IP (VoIP)▫Voice Response Workflow▫Conversational Grammar Builder
![Page 5: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/5.jpg)
Common Application Scenarios•Customer Service
▫Pay bills by phone (ex: ChoicePay)▫Order products (ex: Tickets.com)▫Customer Support (ex: Dell)▫Banking (ex: Bank of America)
•Information Worker Markets▫Pipeline workers▫Insurance Appraisers▫Realtors▫For workers that may not be in front of a
desktop
![Page 6: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/6.jpg)
New Features
•Support for .NET 2.0 Framework•Support for VoiceXML •Voice Response Workflow Applications
▫Based on Windows Workflow Foundation•Native Support for VoIP•Integrated into Office Communications
Server.
![Page 7: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/7.jpg)
Speech Server Architecture
![Page 8: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/8.jpg)
Speech Recognition Supported Languages•English – Austalia•English – United Kingdom•English – North America•German – Germany•Spanish – Americas•More to come…
![Page 9: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/9.jpg)
VoiceXML
•W3C’s standard XML Format for specifying interactive voice dialogues between a human and a computer
•Interpreted by a voice browser
![Page 10: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/10.jpg)
SALT• SALT Forum was founded on October 15,
2001 ▫Microsoft▫Cisco▫Comverse▫Intel▫Philips▫ScanSoft
• W3C work initiated in July 2002• SALT Forum seems to have gone dead. The
last press release was in 2003.• Main concept was multimodal applications
▫Speechify the web, ivr, handhelds, etc…
![Page 11: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/11.jpg)
SALT Usage
•Microsoft Speech Server 2004▫Only SALT
•Microsoft Speech Server 2007▫SALT and VXML
•Plugin for Internet Explorer
![Page 12: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/12.jpg)
Key Workflow Concepts• Workflows are a set of activities
▫ The work flow itself is an Activity• Activities are the building blocks of the application
▫ A single unit of Reuse▫ A single unit of Execution
• An Activity has associated properties, conditions, and events
• Developers can build their own Custom Activity Libraries▫ Image your own Telerik RAD Controls, Infragistics
Controls, etc… Just for VUI’s• A Workflow runs within a Host Process
▫ WAS▫ IIS▫ .EXE▫ Windows Managed Services
![Page 13: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/13.jpg)
Dialogue Flow is a Workflow
•Speech Server only supports sequential workflow development
![Page 14: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/14.jpg)
Speech Application Development•Define the dialogue flow
▫Statements, questions, answers, etc…▫Other activities
•Specify possible answers (grammars)•Record questions (prompts)•Integrate into the back-end (Web
Services)•Deploy, test, and tune application
![Page 15: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/15.jpg)
Developing Your Prototype
Managed Code Assembly
![Page 16: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/16.jpg)
Tuning Applications
•Out of the box speech applications▫Are not robust to real world user input▫Need real data to optimize
•Trial phases required for gathering data▫Wizard of Oz phase▫Pilot phases
•Visual Studio Integrated Analytics and Tuning Studio tool can be used to analyze the data and find problems
![Page 17: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/17.jpg)
Reporting in Speech Server•Measuring application performance and
server performance▫Call-Volume ▫Self Service completion rates
•Sharing reporting date throughout the business▫Speech server can leverage the full SQL
Server stack Reporting Services Analysis Services Integration Services
![Page 18: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/18.jpg)
Data Management – Trace Logging• Logs
▫Call details▫Application instrumentation▫Audio and grammers▫Server latencies▫More..
• Saved in Speech Server Log files• Can import via Log import tool into your SQL
Server Database/Farm• Analyze via Speech Server 2007 Analytics
and Tuning Stuiod• Present reports via SQL Server Reporting
Services
![Page 19: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/19.jpg)
Logged Information - Prompt
•Prompt▫Content▫Barge-in detection▫Rate/Volume▫Persona
![Page 20: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/20.jpg)
Logged Information - Response
• Input Mode▫ Speech▫ DTMF
• Grammar▫ Content (coverage)▫ Rule weights▫ Pronunciations
• Confirmation Threshold• SR configuration
▫ Speech Detection▫ Rejection Threshold▫ Silence Timeout▫ Endsilence▫ Decoder …▫ Acoustic Models …
![Page 21: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/21.jpg)
![Page 22: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/22.jpg)
Voice User Interface (VUI)•Allows for human interaction with computers
through a voice/speech platform•VUI is the interface to any speech application•Drive to make them conversational• Instead of Browser Incompatibility you have
dialect incompatibility.•Not all business processes are suited to VUIs.
▫Some are too complex▫Sometimes automation is impossible or
impractical
![Page 23: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/23.jpg)
Grammars
•Best practice: constrain the grammar as much as possible.
•Good prompt design guides the caller to use in-grammar responses.
•Out-of-grammar (OOG) responses are handled with more explicit prompting to elicit in-grammar response.
![Page 24: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/24.jpg)
VUI Design Best Practices1) Use DTMF for long numbers2) Don’t use open ended prompts3) Don’t repeat prompts4) Focus on grammar accuracy5) If natural dialogs fail, fall back to directed dialog6) Always confirm what was recognized7) Generate prompts based on recognition
confidence scores.8) Bail out if too many errors occur9) Keep text-to-speech output to a minimum10)Be aware of human memory11)“Platinum Rule”12)Let the Caller Drive
![Page 25: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/25.jpg)
Use DTMF for Long Numbers
•Limit spoken digits to 4 or less•This rule is often broken for:
▫Credit Card Numbers▫Social Security Numbers▫Bank Account Numbers▫Telephone Numbers
•DON’T Break This Rule!!!•Remember customer privacy!
![Page 26: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/26.jpg)
Don’t Use Open Ended Prompts
•BAD: “Hello, thank you for calling Tulsa Techfest. May I help you?
•BETTER: “Hello, thank you for calling Tulsa Techfest, would you like to hear about today’s speakers?
![Page 27: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/27.jpg)
Don’t Repeat Prompts
•Callers will tend to repeat the same response you did not understand the first time, when prompts are repeated
•Provide Escalated Help
![Page 28: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/28.jpg)
Focus on Grammar Accuracy• Spend time TUNING and REFINING your
grammars• Accuracy is IMPERATIVE• To reduce recognition failures:
▫Create prompts that make it clear what the user can and should say
▫Test grammars with many different utterances from several people
▫Record incoming calls once the system is in production and use this information to continually tune the grammars.
• Watch for dialects!
![Page 29: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/29.jpg)
If Natural Dialogs Fail, Fall back to Directed Dialog•Natural Dialogs are great, but they have a
higher rate of failure.•Don’t want to frustrate the user
![Page 30: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/30.jpg)
Always Confirm What Was Recognized•Mismatches are common
▫Austin/Boston▫Sharp/Shark▫Brittney Spears/Kevin Federline
•Even for grammars with low ambiguity it’s important to confirm your recognition
•Implicit confirmation▫Ok Jason, Are you coming to Techfest?
•QA Control makes it easy to provide confirmation
![Page 31: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/31.jpg)
Generate Prompts Based on Recognition Confidence Scores•Speech recognition errors are common•How to handle?
▫Changing prompts▫Falling back to directed dialogs▫Transferring to operator
•Humans change their interaction based on perceived confidence, whether implicitly or explicitly
•N-Best lists are of great value here
![Page 32: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/32.jpg)
Confidence Scores & N-Best Lists•The recognition engine returns a
confidence score along with a result•The recognition engine can return several
“guesses” of what it understood.•You tell the engine to return up to N
guesses.
![Page 33: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/33.jpg)
Skip Lists
•Skip List is a type of N-Best processing•Keep track of results that caller has
confirmed ‘no’ to, and don’t ask again.
![Page 34: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/34.jpg)
Bail Out If Too Many Errors
•Don’t make your customer become a “0” (zero) jammer
•Transfer to a live person if they error out more than twice
•Remember, some people have speech impediments, or patterns that may not correlate well into recognition confidence.
•Find the threshold! (This takes testing)
![Page 35: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/35.jpg)
Keep TTS Output to a Minimum
•Does not sound professional•Hire a voice talent.. The payoff will justify
the upfront cost•Can use as a fall back for data or prompts
that need to be dynamic
![Page 36: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/36.jpg)
Be Aware of Human Memory
•Make lists short•No more than 5 items•Present large lists in chunks•Make the prompts short
![Page 37: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/37.jpg)
Platinum Rule
•Treat users as they want to be treated, not how you want to be treated
•Step into their shoes•Use vocabulary they understand
![Page 38: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/38.jpg)
Let The Caller Drive
•Provide instant gratification (let’s the caller get in a zone, and they enjoy the experience due to small successes)
•Only ask for what you need, not everything at once.
![Page 39: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/39.jpg)
VUI Design is a Science•Design before development
•Wizard of Oz Testing
•Find balance between business requirements and the caller experience
•Run usability trials on test subjects to validate your design
•Use a pilot to trial the application. If caller behavior is not as expected, make adjustments.
![Page 40: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/40.jpg)
Demos
![Page 41: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/41.jpg)
![Page 42: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/42.jpg)
Additional Information
•http://www.microsoft.com/speech•http://www.microsoft.com/uc•http://www.gotspeech.net•http://www.nuance.com•https://www.intervoice.com/•http://www.tellme.com/•http://www.vuidesign.org/
![Page 43: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/43.jpg)
Further Resources•My Blog
▫http://www.okcodemonkey.com•Linkedin
▫http://www.linkedin.com/in/okcodemonkey•Bartlesville .NET User Group
▫http://www.bdnug.com•Twitter
▫http://twitter.com/okcodemonkey•Email
![Page 44: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/44.jpg)
Key Terms
![Page 45: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/45.jpg)
![Page 46: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/46.jpg)
Voice Browser• “Web Browser” that presents and IVR VUI to
the user• Provides interface to the PSTN or a PBX• Works with Voice Dialogues (were web
browsers work with HTML/XHMTL)• Presents information aurally via:
▫Text-To-Speech▫Prerecorded prompts
• Obtains information through:▫Speech Recognition▫DTMF detection
![Page 47: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/47.jpg)
Speech Recognition
•Converts spoken words to machine readable input
![Page 48: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/48.jpg)
DTMF (Dual-tone Multi-Frequency)•Used for telephone signaling over the line
in the voice-frequency band to the call switching center.
•Standardardized ny the ITU-T Recommendation Q.23
![Page 49: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/49.jpg)
Text-To-Speech (Speech Synthesis)
•Artificial production of human speech•Computer used is called the speech
synthesizer•Can be implemented in software or
hardware•Converts normal language text into
speech
![Page 50: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/50.jpg)
PSTN (Public Switched Telephone Network)• Network of the world’s public circuit switched
telephone networks• Similar to the way the Internet is the network
of the world’s public IP-based packet-switched networks.
• Originally a network of fixed-line analog telephone systems
• Now almost completely digital and includes mobile phones
• Governed by technical standards created by the ITU-T, and uses E.163/E.164 addresses (telephone numbers)
![Page 51: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/51.jpg)
ITU-T (International Telecommunication Union Standardization Sector)•Coordinates standards for
telecommunications on behalf of the International Telecommunications Union
•Based in Geneva, Switzerland•Original work dates back to 1865, with
the birth of the International Telegraph Union
•Became a United Nations specialized agency in 1947
![Page 52: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/52.jpg)
ITU (International Telecommunication Union)•Established to standardize and regulate
international radio and telecommunications.
•Founded as the International Telegraph Union on May 17, 1865 in Paris
•Main tasks include standardization, allocation of the radio spectrum, and organizing interconnection agreements between countries
![Page 53: Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server](https://reader033.fdocuments.us/reader033/viewer/2022052900/555a70ebd8b42ae7218b533d/html5/thumbnails/53.jpg)
PBX (Private Branch Exchange)
•Is a telephone exchange that serves as a particular business or office, as opposed to one that a common carrier or telephone company operates for many businesses