Information Retrieval using Intelligent Speech Communication Interface Institute of Informatics of...

24
Information Information Retrieval using Retrieval using Intelligent Speech Intelligent Speech Communication Communication Interface Interface Institute of Informatics of the Slovak Academy of Sciences, Bratislava [email protected]

Transcript of Information Retrieval using Intelligent Speech Communication Interface Institute of Informatics of...

Information Retrieval Information Retrieval using Intelligent Speech using Intelligent Speech Communication InterfaceCommunication Interface

Institute of Informatics of the Slovak Academy of Sciences, Bratislava

[email protected]

WIKT 2006 2

Overview

1. Introduction

2. IRKR system

3. Architecture

4. Pilot applications

5. Realization of service

WIKT 2006

What is a Speech Communicarion Interface (SCI)?

• A SCI, or Spoken Language Dialog System (SLDS) is a computer system that you can talk to in order to carry out some task

• Contemporary SLDSs are typically of two kinds:

– Transaction-based systems, allowing to undertake some transaction, such as buying or selling stocks, or reserving a seat on a plane

– Information-provision systems, providing information in response to a query, such as a request for timetable information or weather information

• The circle of typical speech dialog in SCI shows also main components of SCI

WIKT 2006

The Speech Dialog Circle in SLDS

DM

SLUResponseGeneration

Automatic SpeechRecognition

Spoken LanguageUnderstanding

DialogManagement

ASR

Data,Rules

Speech

Words spoken

”I need a flight from Košice to Bratislava roundtrip”

Speech

Meaning

ORIGIN_CITY: KOŠICEDESTINATION_CITY: BRATISLAVAFLIGHT_TYPE: ROUNDTRIP

Action

GET DEPARTURE DATE

Which date do you want to fly from Košice to Bratislava?

RG

TTS Text-to-Speech

WIKT 2006 5

IRKR

• first SLDS which is able to interact in the Slovak language

• developed in the period from July 2003 to June 2006

• supported by the National program for R&D “Building of the information society”

WIKT 2006

IRKR - partners

• Technical University of Košice

• Institute of Informatics, the Slovak Academy of Sciences

• Slovak University of Technology in Bratislava

• University of Žilina

WIKT 2006

IRKR - specification

• natural interaction

• multi-user interaction

• slovak language

• fixed and mobile telephone networks

• access to distributed information (on internet)

WIKT 2006 8

IRKR - architecture

• DARPA Communicator architecture

• ‘hub-and-spoke’ • each module seeks services

from and provides services to the other modules

• modules communicate with them through the central software router - the Galaxy hub

• communicator.sourceforge.net

WIKT 2006

Distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems;

available under a liberal open source license;

not an end-to-end dialogue system, but provides tools for constructing such a system out of a suite of servers;

provides a sophisticated and general transport layer for connecting servers and Hubs, as well as a message syntax (does not provide specifications about semantics);

the core Galaxy Communicator infrastructure is written in C;

support for defining server and connection initialization functions in C, Python, Java and Allegro Common Lisp.

Galaxy – basic overview

WIKT 2006

IRKR - architecture

S LD S - S p e e ch La n g u a g e D ia lo g S y s te m

T elep h o n ys er v er

AS Rs er v er

T T Ss er v er

HUB

I n te r n e tI n f o r m atio ns er v er

D ia lo g u em an ag er

Vo ic eX M L

T elep h o n en etw o r k

WIKT 2006

Automatic speech recognition server

• conversion of incoming speech to a corresponding text

• two speech recognizers of freely available for nonprofit research

• ATK - htk.eng.cam.ac.uk/develop/atk.shtml• SPHINX - cmusphinx.sourceforge.net

• Phoneme acoustic models:• built following REFREC 0.96 training procedure • acoustic features were conventional 39-dimensional MFCCs, including energy and first and second order deltas• 3-state left-to-right HMMs • context dependent (triphone) acoustic models

WIKT 2006

Databases used for ASR training

• SpeechDat-E SK• 1000 speakers, PSTN (office, home, phonebooth)

• MobilDat SK• 1100 speakers, GSM networks (office, home, street, vehicle, public building)

• Both of them balanced for:age, regional accent, and sex of the speakers

• Every speaker pronounced 50 files - numbers, names, dates, money amounts, embedded command words, geographical names, phonetically balanced words, phonetically balanced sentences, Yes/No answers and one longer non-mandatory spontaneous utterance

WIKT 2006

Text-to-speech synthesis

• TTS converts outgoing information in text form to speech • intelligibility , naturalness • we developed two TTS modules using two different approaches:

• diphone• intelligible speech • flexible and totally domain–independent • computationally inexpensive• small memory-footprint•sounds a bit robotic and tedious

• unit-selection• better naturalness• some problems with intelligibility • limited domain

WIKT 2006

TTS architecture

T e xt p re p ro c e s s o r

S y nta c tic - p ro s o d ic p a rs e r

P ro s o d y ge ne ra tio nF 0 , E n erg y, d u ra tion , ...

S e gm e nt lis tge ne ra tio nInd e x o f s p e e c h

s e gm e nts D B

te x t a n a ly s is

Ind e x o fa c o u s tic o ns

p re p a ra tio n

O rtho e p ic tra ns c rip tio nS AM P A cod e

P ro s o d ym a tc hing

S e gm e ntc o nc a te na tio n

S igna l S y nthe s is

S p e e c hs e gm e nts D B Aco u stico ns

D B

s ig n a l p ro ce ss in g

T E X T

S P E E C H

G AL AX Y w r ap p er

T e lep h o n y s e r v e rHUB

d ic t io n ar y

p r o c es s in g o fn u m er a ls an dab b r ev ia tio n s

d a ta d r iv enp h o n e tic

tr an s c r ip tio n

T T Sc o n tr o lb lo c k

c o r p u s /S D B

u n its e lec t io n

u n it c o n c a ten a tio n

au d io f ile

b ro k er ch a n n el

p h r as e c ac h e

h ig h lev el syn th esis lo w lev el syn th esis

TTS s e rv e r

Diphone synthesizer Unit selection synthetizer

WIKT 2006

Dialogue manager

•The dialogue manager controls the dialogue of the system with the user• The heart of the dialogue manger is the interpreter of VoiceXML mark-up language:

• simplifies speech application development• enables distributed application design • accelerates the development of interactive voice response (IVR) environments

WIKT 2006

Dialogue manager architecture

X M L p ar s er

G r am m ar sh an d lin g

u n it

I n p u tin te r f ac e

O u tp u tin te r f ac e

D o c u m en tm an ag er

Lo g g in gin te r f ac e

EC M AS c r ip tu n it

Vo ic e X M Lin ter p r e te r

( c o r e)

Vo ic eX M L

HUB

D ia lo g u e m a n a g e r

WIKT 2006

Audioserver

• provides the whole information system with reliable multiuser connection to the telephone networks• supports telephone hardware - Dialogic D120/41JCT-LSEuro card• The direct (broker) connection between audio server and ASR server or TTS server

WIKT 2006

Dialogue manager architecture

P ABXS w itc h

G S M - G W

M - G W

I S D N /P S T N

I P

G S M

a /bB R A

H .3 2 3 SI P

T elep h o n yi/o b o ar d

Au d ios er v er

HUBT T S AS R

4 . .1 2 . . . . a /b

ip n e two rk

B R A /P R A

WIKT 2006

Information server - IS

• IS connects the system to information sources and retrieves information required by the user• special IS for every pilot application – special web wrapper• a rule based ad-hoc IS searching only several predefined web-servers with a relatively well known structure of pages will do a much better job• returning the data in the XML format• caching of results with user defined expiration

WIKT 2006

IS architecture

HUB

Integrator

web wrapper

web wrapper

web wrapper

web wrapper

web source

web source

web source

web source

IS - Backend

Galaxy interface

Internet

WIKT 2006

WEB wrapper

• navigation through the web-server• extraction from the web-pages• mapping on to a structured format (XML)• data verification

• robust as possible against changes in the web-pages structure

W e b w ra p p e r

N av ig a tio nm o d u le

E x tr ac tio nm o d u le

I n te r n e t D atav er if ic a t io n

M ap p in gm o d u le

H TM L X M LD a ta b ase

S Q L

WIKT 2006 22

Pilot applications• “Weather forecast in Slovakia“

• www.meteo.sk; www.shmu.sk• weather forecast for about 80 Slovak district towns Place: District town or holiday locality

Date: relative date / accurate date

• „Timetable of Slovak Railways“• www.cp.sk• information about Slovak railways timetable

Starting place: railway station in Slovakia Destination place: railway station in Slovakia

Date: relative date (today, tomorrow etc.)/absolute date

(“the twentieth of December” etc.)

Time: departure time (hour, minute)

WIKT 2006

Realization of services

• available at: +421 55 602 2297, +421 2 5941 1118 (T-com), +421 911 650 038 (T-Mobile), +421 918 717 491 (Orange), irkr_pub (skype)• IRKR on web - irkr.fei.tuke.sk

Here we show a typical dialogue between the user (U) and the system (S):S: Welcome to the IRKR portal. Would you like to play the introduction? U: No. S: Choose one of the services: Weather forecast or Railway’s timetable. U: Weather forecast S: Please, name a city and assign a day, for which you want to get the weather forecast. U: Bratislava, tomorrow. S: Did you say Bratislava, tomorrow? U: Yes S: The weather forecast for Bratislava for tomorrow is: sunny, 32 centigrade...

WIKT 2006 24

Thank you for your attention