1 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter Development of...
-
Upload
rosa-britney-mclaughlin -
Category
Documents
-
view
218 -
download
0
Transcript of 1 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter Development of...
1 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Development of conversational interfaces at Nokia Research Center
Boda Péter Pá[email protected]
Language Technology & Applications, Voice Interfaces Group
Speech and Audio Systems Laboratory
Nokia Research Center
14 October, 2002
2 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Contents
• Background• personal• Language Technology and Applications group at NRC
• A commercial implementation: Nokia One Voice Service
• Overview of CATCH-2004: multilingual conversational interface
• Demos
• Summary
3 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Personal background• Born in 1965, Miskolc, Hungary
• M.Sc. in Telecommunications, 1991, Budapest, Tech. Univ. of Budapest
• Post-graduate studies: TUB 1991-1994, HUT 1992-1994, Nijmegen 1995
• Lic. Tech. Speech Technology and Neural Networks, 1995, Helsinki, HUT
• Working on • speech analysis 1990-1995 • speech recognition 1995-1997• spoken dialogue systems, language technology 1996-
• Interest:• Natural Language Understanding (semantic decoding)• Dialogue Management• Processing multimodal and contextual input
4 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Language Technology and Applications
• Mission: develop language technology for Nokia’s offering
• Dialogue-based application development for telecommunication (mainly network-based implementations)
• Seamless integration of Natural Language Understanding technology to user interfaces
• Covering the entire development process:• conceptual design • data collection and analysis• grammar building and tuning, NLU training & testing• Wizard-of-Oz experiments• type-in and speech-enabled tests• objective and subjective evaluation• human factors consideration, usability studies
Personnel: a diverse team of linguists, software and telecomm engineers
5 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
What will new generation of speech interfaces bring?
• Enhanced usability:- naturalness in terms of linguistic expressions;- ease of use;- human-human like dialogues;- accelerated system-user interactions;
• Well-defined framework to port to other languages & tasks :- end-to-end solutions (design, data collection,
Wizard-of-Oz studies, implementation, test, assessment);
- shortened development cycle (development tools).
6 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
A commercial implementation:
Nokia One Voice Service
http://www.nokia.com/nokiaone
8 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Speech interface for e-mail reading
• Features
• DTMF and speech access (language of the user interface is English)
• dialogue-based implementation with mid-complex task grammar
• functionalitites:• browsing e-mails• selecting for reading• send in SMS • reply with voice clip
• accurate language identification
• text-to-speech (TTS) for several languages when reading back e-mails
• English, Finnish, Italian, French, German, Spannish
• e-mail preprocessors prior to TTS
• usability studies show that the speech version is more popular now than the DTMF version
9 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Some general comments• Before implementing any speech interface:
• think about its role: replacement or addition?
• if addition, how it will help/complete the current user interface
• is there any real added value it can bring? – acceleration, security?
• think carefully the efforts you need to develop a solution
• amount and ratio of research and implementation
• never underestimate the results of user/usability tests – go for real
• TTS is important, users comment primarily that and not the recognition part. TTS can mean language technology, as well.
10 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
An EU project:
CATCH-2004 – Converse in AThens-2004, Cologne, Helsinki
http://www.catch2004.org/
11 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Jan 2000-June 200230 months
7 partners5 countries
603 Person-Months6.5 M€ (3.25 from EC)
2 demonstrators : Athens, Helsinki1 tester: Cologne
16 deliverables 11 milestones
A multi-multi-multi project ….
12 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Gerhard-MercatorUniversität Duisburg
Finland
Greece
NTUA
GermanyFrance, Germany, Greece,Czech Republic
Consortium
13 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Overview• The "flag-ship" of the 5th EU-IST programme
• Objectives: • conversational interface to (city) information services: build various
applications, possessing high performance accuracy and satisfying requirements set for well-functioning spoken dialogue systems
• multilingual (Finnish, English, German, Greek)• multidevice (kiosk, phone, smart wireless)• multimodal (GUI, speech)• Internet infrastructure (WAP, VoiceXML, remote databases)
• Nokia's role: • WAP access• Multimodal browsing• NLU development for Helsinki demonstrator
• Helsinki demos:• 2000: Art-Goes-Kapakka - just to experiment the NLU toolkit• 2001: Program Guide Information Service - has relevance to other
project
14 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Inside the NLU module
Speech recognition
Natural Language
Understanding(NLU)
incl. Dialogue Manager
Speech synthesis
Database
15 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
What does NLU module do?
• The NLU toolkit employed in CATCH-2004:• IBM ViaVoicePhone Telephony Natural Language
Tools– Statistical approach– The speaker is not restricted to any particular
vocabulary or commands but can freely express the request by using natural language expressions.
(1) Interprets the meaning of the user utterance and decides what to do with the utterance.(2) Interacts with the backend database(3) Decides what kind of answer will be provided
16 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
The components of NLU module• The NLU module contains four main components.
Statistical Classer
Canonicalizer
Statistical Parser
Dialog Manager
Extracts the key concepts of the utterance.
Transforms certain conceptsto a form which is understood by the backend database.
Determines what to do withthe key concepts from the classer.
Directs the interaction between the user and the system.
Output of the recogniser Sequence of words, as the LM allows.
17 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Multilingual Architecture
Speech recognition
NLU
Multilingual TASK
Multilingual LM/Voc
Multilingual AM
Multilingual classer
Multilingual parser(Lang ID)
Canonicalizer
Dialog manager
Answer generation(language-dependent TTS)
LM language modelVoc vocabularyAM acoustic modelsTTS text-to-speechLang ID language identification
18 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
• Helsinki demos:• 2000: Art-Goes-Kapakka - just to experiment the NLU toolkit• 2001: Program Guide Information Service – more realistic
• AGK• developed as the first NLU application at Nokia• good exercise to walk through (with sweat) the entire
development process• strict co-operation with IBM, regular consulting• results were comparable to others• easiness: handlable size & complexity, (almost) available
database
• PGIS• we wanted a more real-life application• Electronic Program Guides are coming to use as digital TV
spreads• on-going standardisations (MPEG-7 -> program types and
sub-types)
Historically speaking ….
19 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Supported functionalities in PGIS
• A LIST based on the following parameters:
• A QUERY about a particular program:
DATE PROGRAM NAME PROGRAM TYPE TIME LANGUAGE PERFORMER CHANNEL PRICE NEW
DATE YEAR PRICETIME COUNTRY OF ORIGIN RESTRICTIONSDURATION EPISODE TITLE DESCRIPTIONCHANNEL WEB ADDRESSPERFORMERS RE-RUNLANGUAGE PEOPLE BEHIND THE PROGRAMPROGRAM TYPE SUBTITLES
20 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Gaining better performance via iterative tests
6.86
48.74
30.06
11.87
23.63
3.27
12.85
6.55
36.59
3.60
9.95
0.84
29.39
1.02 0.31
4.17
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
Classer Parser Canonicalizer ASR (WER)
(Jul 01) (Sep 01) (Oct 01) (Apr 02)
error rate
English system
automatic speech recognition
(word error rate)
21 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
A real (!!!!) dialogue with PGIS
S: Welcome to Program Guide. What would you like to know?
U: Could you please tell me about movies tonight?
S: I found the following movie programs on Monday between 6 pm and 12 pm:
It's Always Fair Weather, The Murder Men, Boom Town, and The Hook.
U: What time is the first one?
S: It's Always Fair Weather starts at 5 past 5 pm. How else can I help you?
U: How long does it last?
S: It's Always Fair Weather lasts 1 hours 40 minutes. What information can I offer you next?
U: Who are the actors?
S: It's Always Fair Weather. The actors are Gene Kelly, Dan Dailey, Michael Kidd,
Cyd Charisse. How else can I help you?
U: Can I have a description for it please?
S: I can offer you the following type of description about the program called
It's Always Fair Weather. The unbeatable musical team of Kelly and Donen is behind
this zappy musical about three serviceman buddies reunited after 10 years…...
What else can I do for you?
U: That's all for now. Good-bye.
S: Thank you for using the Program Guide. Good-bye.
22 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
NEW CONTEXT!
NEW CONTEXT!
NEW CONTEXT!
Moviestonight?
…. starting time of the 1st?
…. duration? (in Finnish)
…. description?
NEW CONTEXT!
Movies withMichael Douglas?
Welcome to Program Guide!How may I help you?
… it starts at 5.15pm
I found the following programs …
… it is 1h 25min long(in Finnish)
… no text message,thanks. (in Finnish)
I can offer the following
description …. To text message?
What kind of info I can offer next?
(in Finnish)
Programs for youngsters?
(in Finnish)
… sorry, no programs for youngsters
(in Finnish)
Michael Douglas is in Coma … Channels?
BBC World, CNN, Eurosport, TCM
What’s on BBC Worldtonight at 10pm?
World News at 10pm
… it takes 5 minutes(in Finnish)
That’s all for now.
Good bye!
…. duration? (in Finnish)
23 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
What lessons have we learnt?• In general:
• Research project has its own difficulties – risk must be taken but within limits;
• Know your partners, their capabilities and be initiative in co-operation;
• Strong dependency on one partner’s technology might be problematic;
• About technology• Good to have linguists around, although many of the development phases
require engineering skills;
• Everything should be planned as precisely as possible, even tests and evaluation methods;
• The best results are gained with successive test-evaluation-improvement cycles;
• This kind of technology is quite new the users often don’t know the possibilities of the system, therefore the instructions must be very guiding and clear:
• difficult if only a demo system available with fake database, without comparable traditional system;
• test users must be awarded – very crucial, otherwise no motivation
• The real picture about system functionality and operability can be gained only from real users in real situations.
24 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Finally ….
Gábor Dénes (1969):
"If enough people work hard enough on the problem of speech recognition, it will be solved
by mid next century."
25 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
References• http://www.nokia.com/nokiaone
• Oria, D. & Koskinen, E., ”E-Mail Goes Mobile: The design and implementation of a spoken language interface to e-mail” – ICSLP’2002
• http://www.catch2004.org/
• Harrikari, H., M. Mast, T. Ross & H. Schulz: 2002, “Different Approaches to Build Multilingual Conversational Systems”. 5th International Conference on Text, Speech and Dialogue, TSD 2002, Brno, Czech Republic.
• Kleindienst J., L. Seredi, P. Kapanen & J. Bergman: 2002a, “CATCH-2004 Multi-Modal browser: Overview Description with Usability Analysis”. IEEE 4th International Conference on Multi-modal Interfaces, Pittsburgh, PA, U.S.A.
• Kleindienst J., L. Seredi, P. Kapanen & J. Bergman: 2002b, “Loosely-coupled approach towards multi-modal browsing”, Submitted to Universal Access in Information Society magazine’s special issue on Multi-modal User Interfaces.
• Boda, P. et al.: “Subjective Evaluation of a Personalised Conversational Interface to a Program Guide Information System ” – Submitted to the User Modeling and User-Adapted Interaction journal (UMUAI) Special Issue on User Modeling and Personalization for Television.
26 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter
Abbreviations
• AM
• ASR
• CTI
• DM
• LM
• NLU
• SUI
• TTS
• VVT
• WOZ
acoustic model
automatic speech recognition
computer-telephone integration
dialogue manager
langauge model
natural language understanding
speech user interface
text-to-speech synthesis
ViaVoice Telephony (IBM's speech resources)
wizard of Oz