1 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter Development of...

26
1 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter Development of conversational interfaces at Nokia Research Center Boda Péter Pál [email protected] Language Technology & Applications, Voice Interfaces Group Speech and Audio Systems Laboratory Nokia Research Center 14 October, 2002

Transcript of 1 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter Development of...

1 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Development of conversational interfaces at Nokia Research Center

Boda Péter Pá[email protected]

Language Technology & Applications, Voice Interfaces Group

Speech and Audio Systems Laboratory

Nokia Research Center

14 October, 2002

2 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Contents

• Background• personal• Language Technology and Applications group at NRC

• A commercial implementation: Nokia One Voice Service

• Overview of CATCH-2004: multilingual conversational interface

• Demos

• Summary

3 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Personal background• Born in 1965, Miskolc, Hungary

• M.Sc. in Telecommunications, 1991, Budapest, Tech. Univ. of Budapest

• Post-graduate studies: TUB 1991-1994, HUT 1992-1994, Nijmegen 1995

• Lic. Tech. Speech Technology and Neural Networks, 1995, Helsinki, HUT

• Working on • speech analysis 1990-1995 • speech recognition 1995-1997• spoken dialogue systems, language technology 1996-

• Interest:• Natural Language Understanding (semantic decoding)• Dialogue Management• Processing multimodal and contextual input

4 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Language Technology and Applications

• Mission: develop language technology for Nokia’s offering

• Dialogue-based application development for telecommunication (mainly network-based implementations)

• Seamless integration of Natural Language Understanding technology to user interfaces

• Covering the entire development process:• conceptual design • data collection and analysis• grammar building and tuning, NLU training & testing• Wizard-of-Oz experiments• type-in and speech-enabled tests• objective and subjective evaluation• human factors consideration, usability studies

Personnel: a diverse team of linguists, software and telecomm engineers

5 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

What will new generation of speech interfaces bring?

• Enhanced usability:- naturalness in terms of linguistic expressions;- ease of use;- human-human like dialogues;- accelerated system-user interactions;

• Well-defined framework to port to other languages & tasks :- end-to-end solutions (design, data collection,

Wizard-of-Oz studies, implementation, test, assessment);

- shortened development cycle (development tools).

6 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

A commercial implementation:

Nokia One Voice Service

http://www.nokia.com/nokiaone

7 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Nokia One Voice Service

8 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Speech interface for e-mail reading

• Features

• DTMF and speech access (language of the user interface is English)

• dialogue-based implementation with mid-complex task grammar

• functionalitites:• browsing e-mails• selecting for reading• send in SMS • reply with voice clip

• accurate language identification

• text-to-speech (TTS) for several languages when reading back e-mails

• English, Finnish, Italian, French, German, Spannish

• e-mail preprocessors prior to TTS

• usability studies show that the speech version is more popular now than the DTMF version

9 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Some general comments• Before implementing any speech interface:

• think about its role: replacement or addition?

• if addition, how it will help/complete the current user interface

• is there any real added value it can bring? – acceleration, security?

• think carefully the efforts you need to develop a solution

• amount and ratio of research and implementation

• never underestimate the results of user/usability tests – go for real

• TTS is important, users comment primarily that and not the recognition part. TTS can mean language technology, as well.

10 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

An EU project:

CATCH-2004 – Converse in AThens-2004, Cologne, Helsinki

http://www.catch2004.org/

11 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Jan 2000-June 200230 months

7 partners5 countries

603 Person-Months6.5 M€ (3.25 from EC)

2 demonstrators : Athens, Helsinki1 tester: Cologne

16 deliverables 11 milestones

A multi-multi-multi project ….

12 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Gerhard-MercatorUniversität Duisburg

Finland

Greece

NTUA

GermanyFrance, Germany, Greece,Czech Republic

Consortium

13 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Overview• The "flag-ship" of the 5th EU-IST programme

• Objectives: • conversational interface to (city) information services: build various

applications, possessing high performance accuracy and satisfying requirements set for well-functioning spoken dialogue systems

• multilingual (Finnish, English, German, Greek)• multidevice (kiosk, phone, smart wireless)• multimodal (GUI, speech)• Internet infrastructure (WAP, VoiceXML, remote databases)

• Nokia's role: • WAP access• Multimodal browsing• NLU development for Helsinki demonstrator

• Helsinki demos:• 2000: Art-Goes-Kapakka - just to experiment the NLU toolkit• 2001: Program Guide Information Service - has relevance to other

project

14 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Inside the NLU module

Speech recognition

Natural Language

Understanding(NLU)

incl. Dialogue Manager

Speech synthesis

Database

15 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

What does NLU module do?

• The NLU toolkit employed in CATCH-2004:• IBM ViaVoicePhone Telephony Natural Language

Tools– Statistical approach– The speaker is not restricted to any particular

vocabulary or commands but can freely express the request by using natural language expressions.

(1) Interprets the meaning of the user utterance and decides what to do with the utterance.(2) Interacts with the backend database(3) Decides what kind of answer will be provided

16 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

The components of NLU module• The NLU module contains four main components.

Statistical Classer

Canonicalizer

Statistical Parser

Dialog Manager

Extracts the key concepts of the utterance.

Transforms certain conceptsto a form which is understood by the backend database.

Determines what to do withthe key concepts from the classer.

Directs the interaction between the user and the system.

Output of the recogniser Sequence of words, as the LM allows.

17 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Multilingual Architecture

Speech recognition

NLU

Multilingual TASK

Multilingual LM/Voc

Multilingual AM

Multilingual classer

Multilingual parser(Lang ID)

Canonicalizer

Dialog manager

Answer generation(language-dependent TTS)

LM language modelVoc vocabularyAM acoustic modelsTTS text-to-speechLang ID language identification

18 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

• Helsinki demos:• 2000: Art-Goes-Kapakka - just to experiment the NLU toolkit• 2001: Program Guide Information Service – more realistic

• AGK• developed as the first NLU application at Nokia• good exercise to walk through (with sweat) the entire

development process• strict co-operation with IBM, regular consulting• results were comparable to others• easiness: handlable size & complexity, (almost) available

database

• PGIS• we wanted a more real-life application• Electronic Program Guides are coming to use as digital TV

spreads• on-going standardisations (MPEG-7 -> program types and

sub-types)

Historically speaking ….

19 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Supported functionalities in PGIS

• A LIST based on the following parameters:

• A QUERY about a particular program:

DATE PROGRAM NAME PROGRAM TYPE TIME LANGUAGE PERFORMER CHANNEL PRICE NEW

DATE YEAR PRICETIME COUNTRY OF ORIGIN RESTRICTIONSDURATION EPISODE TITLE DESCRIPTIONCHANNEL WEB ADDRESSPERFORMERS RE-RUNLANGUAGE PEOPLE BEHIND THE PROGRAMPROGRAM TYPE SUBTITLES

20 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Gaining better performance via iterative tests

6.86

48.74

30.06

11.87

23.63

3.27

12.85

6.55

36.59

3.60

9.95

0.84

29.39

1.02 0.31

4.17

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

Classer Parser Canonicalizer ASR (WER)

(Jul 01) (Sep 01) (Oct 01) (Apr 02)

error rate

English system

automatic speech recognition

(word error rate)

21 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

A real (!!!!) dialogue with PGIS

S: Welcome to Program Guide. What would you like to know?

U: Could you please tell me about movies tonight?

S: I found the following movie programs on Monday between 6 pm and 12 pm:

It's Always Fair Weather, The Murder Men, Boom Town, and The Hook.

U: What time is the first one?

S: It's Always Fair Weather starts at 5 past 5 pm. How else can I help you?

U: How long does it last?

S: It's Always Fair Weather lasts 1 hours 40 minutes. What information can I offer you next?

U: Who are the actors?

S: It's Always Fair Weather. The actors are Gene Kelly, Dan Dailey, Michael Kidd,

Cyd Charisse. How else can I help you?

U: Can I have a description for it please?

S: I can offer you the following type of description about the program called

It's Always Fair Weather. The unbeatable musical team of Kelly and Donen is behind

this zappy musical about three serviceman buddies reunited after 10 years…...

What else can I do for you?

U: That's all for now. Good-bye.

S: Thank you for using the Program Guide. Good-bye.

22 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

NEW CONTEXT!

NEW CONTEXT!

NEW CONTEXT!

Moviestonight?

…. starting time of the 1st?

…. duration? (in Finnish)

…. description?

NEW CONTEXT!

Movies withMichael Douglas?

Welcome to Program Guide!How may I help you?

… it starts at 5.15pm

I found the following programs …

… it is 1h 25min long(in Finnish)

… no text message,thanks. (in Finnish)

I can offer the following

description …. To text message?

What kind of info I can offer next?

(in Finnish)

Programs for youngsters?

(in Finnish)

… sorry, no programs for youngsters

(in Finnish)

Michael Douglas is in Coma … Channels?

BBC World, CNN, Eurosport, TCM

What’s on BBC Worldtonight at 10pm?

World News at 10pm

… it takes 5 minutes(in Finnish)

That’s all for now.

Good bye!

…. duration? (in Finnish)

23 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

What lessons have we learnt?• In general:

• Research project has its own difficulties – risk must be taken but within limits;

• Know your partners, their capabilities and be initiative in co-operation;

• Strong dependency on one partner’s technology might be problematic;

• About technology• Good to have linguists around, although many of the development phases

require engineering skills;

• Everything should be planned as precisely as possible, even tests and evaluation methods;

• The best results are gained with successive test-evaluation-improvement cycles;

• This kind of technology is quite new the users often don’t know the possibilities of the system, therefore the instructions must be very guiding and clear:

• difficult if only a demo system available with fake database, without comparable traditional system;

• test users must be awarded – very crucial, otherwise no motivation

• The real picture about system functionality and operability can be gained only from real users in real situations.

24 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Finally ….

Gábor Dénes (1969):

"If enough people work hard enough on the problem of speech recognition, it will be solved

by mid next century."

25 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

References• http://www.nokia.com/nokiaone

• Oria, D. & Koskinen, E., ”E-Mail Goes Mobile: The design and implementation of a spoken language interface to e-mail” – ICSLP’2002

• http://www.catch2004.org/

• Harrikari, H., M. Mast, T. Ross & H. Schulz: 2002, “Different Approaches to Build Multilingual Conversational Systems”. 5th International Conference on Text, Speech and Dialogue, TSD 2002, Brno, Czech Republic.

• Kleindienst J., L. Seredi, P. Kapanen & J. Bergman: 2002a, “CATCH-2004 Multi-Modal browser: Overview Description with Usability Analysis”. IEEE 4th International Conference on Multi-modal Interfaces, Pittsburgh, PA, U.S.A.

• Kleindienst J., L. Seredi, P. Kapanen & J. Bergman: 2002b, “Loosely-coupled approach towards multi-modal browsing”, Submitted to Universal Access in Information Society magazine’s special issue on Multi-modal User Interfaces.

• Boda, P. et al.: “Subjective Evaluation of a Personalised Conversational Interface to a Program Guide Information System ” – Submitted to the User Modeling and User-Adapted Interaction journal (UMUAI) Special Issue on User Modeling and Personalization for Television.

26 © NOKIA NRC – kieliteknologia kurssi.PPT/ 14.10.2002 / Boda Péter

Abbreviations

• AM

• ASR

• CTI

• DM

• LM

• NLU

• SUI

• TTS

• VVT

• WOZ

acoustic model

automatic speech recognition

computer-telephone integration

dialogue manager

langauge model

natural language understanding

speech user interface

text-to-speech synthesis

ViaVoice Telephony (IBM's speech resources)

wizard of Oz