VOICE INFORMATION RETRIEVAL FOR DOCUMENTS my own or was done in

Post on 03-Feb-2022

6 views 0 download

Transcript of VOICE INFORMATION RETRIEVAL FOR DOCUMENTS my own or was done in

VOICE INFORMATION RETRIEVAL FOR DOCUMENTS

Except where reference is made to the work of others, the work described in this thesis is

my own or was done in collaboration with my advisory committee.

__________________ Weihong Hu

Certificate of Approval:

_______________________________ ______________________________ W. Homer Carlisle Juan E. Gilbert, Chair Associate Professor Assistant Professor Computer Science and Software Computer Science and Software Engineering Engineering _______________________________ ______________________________ N. Hari Narayanan Stephen L. McFarland Associate Professor Acting Dean Computer Science and Software Graduate School Engineering

VOICE INFORMATION RETRIEVAL FOR DOCUMENTS

Weihong Hu

A Thesis

Submitted to

The Graduate Faculty of

Auburn University

In Partial Fulfillment of the

Requirements for the

Degree of

Master of Science

Auburn, Alabama

August 4, 2003

3

VOICE INFORMATION RETRIEVAL FOR DOCUMENTS

Weihong Hu

Permission is granted to Auburn University to make copies of this thesis at its discretion,

upon the request of individuals or institutions and at their expense. The author reserves

all publication rights.

__________________

Signature of Author

__________________

Date

Copy sent to:

Name Date

4

THESIS ABSTRACT

VOICE INFORMATION RETRIEVAL FOR DOCUMENTS

Weihong Hu

Master of Science, August 4, 2003

68 Typed Pages

Directed by Dr. Juan E. Gilbert

Currently, new methods of interaction between people and the World Wide Web

are constantly emerging. Among them, voice is becoming more and more preferred.

Various voice applications (telephone-enabled applications) have been implemented and

used by governments, businesses, universities, libraries, visual impaired people etc.

However, very little attention has been given to document information retrieval using

voice because of existing technical difficulties and limitations with natural language

processing, voice recognition, grammar generation, result representation, etc.

This thesis explored the background of information retrieval using voice

especially Interactive Voice Response systems (IVR), several well-known existing

projects; and introduces the concepts of Voice Extensible Markup Language

(VoiceXML) [15]. A voice information retrieval system for documents (VIRD) has been

5

designed and implemented to search for documents from a database using the telephone

and VoiceXML. Five phases have been applied to this research: database creation and

normalization, user inquiries, denormalized view and stored procedures, summarization

functions, and user interface design.

In this research, an experiment has been conducted to measure the effectiveness

and the usability of VIRD. The PARADISE framework [17] was used to evaluate the

effectiveness of VIRD. Both Quantitative data and Qualitative data were collected. Two

sets of metrics were applied and analyzed. A careful analysis of the experiment data

revealed that VIRD achieved its effectiveness and user satisfaction as a mode of

document information retrieval via mobile access. However, it was also found that

improved recognition and improved representation for large result sets were required.

Finally, conclusions of this research are presented and future work that aims to improve

VIRD is suggested.

6

ACKNOWLEDGMENTS

The author would like to express her deep gratitude to her advisor, Dr. Juan E.

Gilbert, for his patient guidance, valuable advice, and continued encouragement

throughout her studies. Sincere thanks are also due to her two graduate committee

members, Dr. N. Hari Narayanan and Dr. W. Homer Carlisle, for their reviewing and

advising efforts. In addition, the author would like to thank her husband, Yapin Zhong,

for his help while conducting the experiment and constant support.

Voice Information Retrieval for Voice Information Retrieval for DocumentsDocuments

Weihong Hu

M.S. Thesis

Dept. of Computer Science & Software EngineeringAuburn University

2

OutlineOutline

MotivationLiterature reviewVIRD System Architecture & Voice User Interface (VUI)ExperimentFuture WorkDemo

3

MotivationMotivationA very large part of the world population does not have access to either computers or the InternetVery tiny visual interfaces make users feel quite uncomfortableBlind or partially-sighted users are not able to access information visuallyVoiceXML technologies provide an alternative way to search for document via mobile devicesVery little work involving VUI for document retrieval

4

Literature ReviewLiterature Review

Information Retrieval via VoiceVoiceXML TechnologyCommon VoiceXML applications

5

Information Retrieval via Voice Information Retrieval via Voice

Traditional Interactive Voice Response systems (IVR)– IVR systems are software applications that accept

telephone input and touch-tone keypad selection and provide appropriate responses

VoiceXML applications– Allow users to call into an application system and use a

combination of their voice and/or telephone input and/or touch tone keypad to interact with the system

– Use HTTP protocol to interact with Web server

6

VoiceXMLVoiceXMLVoice Extensible Markup Language (VoiceXML)A World Wide Web Consortium ( W3C) standard speech-application development languageDesigned for creating audio dialogues that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversationsAllow users to interact with the Internet without needing visualaccess Allow users to have complete control over the user-application interaction through spoken dialogues

7

Voice Portals & System Voice Portals & System InfrastructureInfrastructure

8

Voice Portals & Voice Portals & VoiceXMLVoiceXMLSystem Infrastructure (cont’d)System Infrastructure (cont’d)

Voice portals provide the running platforms for voice applicationsSome well-known voice portals– Tellme, VocalTec, and BeVocal

9

Common Common VoiceXMLVoiceXML applicationsapplications

Simple uses – Movie listings– Traffic information– Order tracking– Directory assistance– Personal information management

Complex uses – Business communications, virtual offices, and voice email– Web-based IVR speech-recognition enabled call centers – E-commerce– Airline reservations– Stock trades and financial services management

10

VIRD System ArchitectureVIRD System Architecture

VoiceXMLInterpreter Controller

DB

IR

Voice Server IR System Speech Interface

Speech

11

VIRD Document DatabaseVIRD Document Database

Twenty document abstractshttp://www.citeseer.comdatabase

12

Sample Document AbstractSample Document AbstractTitle:

Selectivity Estimation for Boolean QueriesAbstract:

In a variety of applications ranging from optimizing queries on alphanumeric attributes to providing approximate counts of documents containing several query terms, there is an increasing need to quickly and reliably estimate the number of strings (tuples, documents, etc.) matching a Boolean query. Boolean queries in this context consist of substring predicates composed using Boolean operators. While there has been some work in estimating the selectivity of substring queries,

13

Sample Sample VoiceXMLVoiceXML grammargrammar<grammar><![CDATA[[[(query)] {<keyword "query">}[(match)] {<keyword "match">}[(boolean)] {<keyword "boolean">}[(estimation)] {<keyword "estimation">}[(selectivity)] {<keyword "selectivity">}[(optimize)] {<keyword "optimize">}[(tuple)] {<keyword "tuple">}[(operator)] {<keyword “operator">}[(application)] {<keyword “application">}[(substring)] {<keyword “substring">}[(alphanumeric)] {<keyword “alphanumeric">}[(attribute)] {<keyword “attribute">}[(approximate)] {<keyword “approximate">}]]]></grammar>

14

PRINCIPLES OF VIRD VUI DESIGNPRINCIPLES OF VIRD VUI DESIGN

Continuous Representation – making the system’s capabilities apparent to the user as a

reminder at any point in the dialogues Immediate Impact – immediate, implicit confirmation must be provided

Incrementality– a sense of continuity and natural flow in the conversation

between the system agent and the user Summarization and Aggregation – the results must be condensed for audio-only interfaces due

to the constraint imposed by auditory memory limitations

15

Diagram of VIRD VUIDiagram of VIRD VUI

Welcome Message

Main Menu Dialogue

Query Dialogue

Results Dialogue

Save Dialogue Confirm MyLibraryConfirm Email

16

VIRD Voice User InterfaceVIRD Voice User Interface

Main Menu DialogueContains four search functions: keyword, title, author

or year

Query DialogueAllows the user to say the words that will be used

during the search

17

VIRD Voice User Interface VIRD Voice User Interface (cont’d)(cont’d)

Results DialogueVoice Navigator: Presents the list of retrieval

documents to the user through a list of voice command: NEXT, PREVIOUS, STOP, REPEAT, DETAIL, TRY AGAIN or SAVE

Save DialogueAllows the user to request a copy of the article via

email or library

18

ExperimentExperimentParticipants– Twenty Computer Science graduate and senior undergraduate

students in a User Interface Design course participated in this experiment at Auburn University (ten female, ten male)

Procedures– Came in, used the same telephone, sit in the same chair, in the

same room with the experimenter (as an observer) – read a one-page instruction sheet – interacted with the VIRD system to complete a task based on the

task scenario. – Task scenario: “You are working on a research paper for Dr. X’s

database course. Your research topic is XML. Dr. X wants you to find a document on the subject tree algebra for XML using the system. When you find the document, use the save option to let the system email it to you”

– filled out a survey giving subjective evaluation of the system’sperformance

19

Evaluation Evaluation MethodologyMethodology

Measuring user satisfaction of the voice user interface for the document retrieval systemPARADISE framework [1]

20

Evaluation Methodology Evaluation Methodology (cont’d)(cont’d)

Maximize user satisfaction

Maximize task success

Minimize costs

Efficiency measures

Qualitative measures

21

Evaluation Metrics Evaluation Metrics

The first set:– Task success– Dialogue efficiency – Dialogue qualitative

The second set:– Completion– Inaccuracy – Misinterpretation

22

Evaluation ResultEvaluation Result

Metrics ComparisonMetrics Comparison

Metrics Comparision Chart

86.50% 89.50%81% 85%

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Task Success Dialogue Efficiency Dialogue Qualitative User Satisfactory

metrics

perc

ent

Series2

Series1

23

Evaluation Result (cont’d)Evaluation Result (cont’d)Time of CompletionTime of Completion

0

100

200

300

400

500

600

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Subjects

Seco

nds

Series1

24

Evaluation Result (cont’d)Evaluation Result (cont’d)MisinterpretationMisinterpretation

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Subjects

Tim

es

Series1

25

Experiment SummeryExperiment Summery

Even though the misinterpretation rate is high, the user satisfaction is still high, this means, the user will accept the errors as long as they can recover from the errors easilyA potential flaw in PARADISE

Maximize user satisfaction

Maximize task success

Minimize costs

Efficiency measures

Qualitative measures

26

Future WorkFuture Work

Investigate Spoken Query Retrieval for Large Documents (Yapin’s research)Investigate a new usability model for Voice User Interface (Priyanka’s research)

27

DemoDemo

28

QuestionsQuestions

29

ReferencesReferences

1. C.A.Kamm & M.A.Walker. Design and evaluation of spoken dialog systems. In Proceedings of the ASRU Workshop, 1997.