VOICE INFORMATION RETRIEVAL FOR DOCUMENTS my own or was done in
Transcript of VOICE INFORMATION RETRIEVAL FOR DOCUMENTS my own or was done in
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS
Except where reference is made to the work of others, the work described in this thesis is
my own or was done in collaboration with my advisory committee.
__________________ Weihong Hu
Certificate of Approval:
_______________________________ ______________________________ W. Homer Carlisle Juan E. Gilbert, Chair Associate Professor Assistant Professor Computer Science and Software Computer Science and Software Engineering Engineering _______________________________ ______________________________ N. Hari Narayanan Stephen L. McFarland Associate Professor Acting Dean Computer Science and Software Graduate School Engineering
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS
Weihong Hu
A Thesis
Submitted to
The Graduate Faculty of
Auburn University
In Partial Fulfillment of the
Requirements for the
Degree of
Master of Science
Auburn, Alabama
August 4, 2003
3
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS
Weihong Hu
Permission is granted to Auburn University to make copies of this thesis at its discretion,
upon the request of individuals or institutions and at their expense. The author reserves
all publication rights.
__________________
Signature of Author
__________________
Date
Copy sent to:
Name Date
4
THESIS ABSTRACT
VOICE INFORMATION RETRIEVAL FOR DOCUMENTS
Weihong Hu
Master of Science, August 4, 2003
68 Typed Pages
Directed by Dr. Juan E. Gilbert
Currently, new methods of interaction between people and the World Wide Web
are constantly emerging. Among them, voice is becoming more and more preferred.
Various voice applications (telephone-enabled applications) have been implemented and
used by governments, businesses, universities, libraries, visual impaired people etc.
However, very little attention has been given to document information retrieval using
voice because of existing technical difficulties and limitations with natural language
processing, voice recognition, grammar generation, result representation, etc.
This thesis explored the background of information retrieval using voice
especially Interactive Voice Response systems (IVR), several well-known existing
projects; and introduces the concepts of Voice Extensible Markup Language
(VoiceXML) [15]. A voice information retrieval system for documents (VIRD) has been
5
designed and implemented to search for documents from a database using the telephone
and VoiceXML. Five phases have been applied to this research: database creation and
normalization, user inquiries, denormalized view and stored procedures, summarization
functions, and user interface design.
In this research, an experiment has been conducted to measure the effectiveness
and the usability of VIRD. The PARADISE framework [17] was used to evaluate the
effectiveness of VIRD. Both Quantitative data and Qualitative data were collected. Two
sets of metrics were applied and analyzed. A careful analysis of the experiment data
revealed that VIRD achieved its effectiveness and user satisfaction as a mode of
document information retrieval via mobile access. However, it was also found that
improved recognition and improved representation for large result sets were required.
Finally, conclusions of this research are presented and future work that aims to improve
VIRD is suggested.
6
ACKNOWLEDGMENTS
The author would like to express her deep gratitude to her advisor, Dr. Juan E.
Gilbert, for his patient guidance, valuable advice, and continued encouragement
throughout her studies. Sincere thanks are also due to her two graduate committee
members, Dr. N. Hari Narayanan and Dr. W. Homer Carlisle, for their reviewing and
advising efforts. In addition, the author would like to thank her husband, Yapin Zhong,
for his help while conducting the experiment and constant support.
Voice Information Retrieval for Voice Information Retrieval for DocumentsDocuments
Weihong Hu
M.S. Thesis
Dept. of Computer Science & Software EngineeringAuburn University
2
OutlineOutline
MotivationLiterature reviewVIRD System Architecture & Voice User Interface (VUI)ExperimentFuture WorkDemo
3
MotivationMotivationA very large part of the world population does not have access to either computers or the InternetVery tiny visual interfaces make users feel quite uncomfortableBlind or partially-sighted users are not able to access information visuallyVoiceXML technologies provide an alternative way to search for document via mobile devicesVery little work involving VUI for document retrieval
4
Literature ReviewLiterature Review
Information Retrieval via VoiceVoiceXML TechnologyCommon VoiceXML applications
5
Information Retrieval via Voice Information Retrieval via Voice
Traditional Interactive Voice Response systems (IVR)– IVR systems are software applications that accept
telephone input and touch-tone keypad selection and provide appropriate responses
VoiceXML applications– Allow users to call into an application system and use a
combination of their voice and/or telephone input and/or touch tone keypad to interact with the system
– Use HTTP protocol to interact with Web server
6
VoiceXMLVoiceXMLVoice Extensible Markup Language (VoiceXML)A World Wide Web Consortium ( W3C) standard speech-application development languageDesigned for creating audio dialogues that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversationsAllow users to interact with the Internet without needing visualaccess Allow users to have complete control over the user-application interaction through spoken dialogues
7
Voice Portals & System Voice Portals & System InfrastructureInfrastructure
8
Voice Portals & Voice Portals & VoiceXMLVoiceXMLSystem Infrastructure (cont’d)System Infrastructure (cont’d)
Voice portals provide the running platforms for voice applicationsSome well-known voice portals– Tellme, VocalTec, and BeVocal
9
Common Common VoiceXMLVoiceXML applicationsapplications
Simple uses – Movie listings– Traffic information– Order tracking– Directory assistance– Personal information management
Complex uses – Business communications, virtual offices, and voice email– Web-based IVR speech-recognition enabled call centers – E-commerce– Airline reservations– Stock trades and financial services management
10
VIRD System ArchitectureVIRD System Architecture
VoiceXMLInterpreter Controller
DB
IR
Voice Server IR System Speech Interface
Speech
11
VIRD Document DatabaseVIRD Document Database
Twenty document abstractshttp://www.citeseer.comdatabase
12
Sample Document AbstractSample Document AbstractTitle:
Selectivity Estimation for Boolean QueriesAbstract:
In a variety of applications ranging from optimizing queries on alphanumeric attributes to providing approximate counts of documents containing several query terms, there is an increasing need to quickly and reliably estimate the number of strings (tuples, documents, etc.) matching a Boolean query. Boolean queries in this context consist of substring predicates composed using Boolean operators. While there has been some work in estimating the selectivity of substring queries,
13
Sample Sample VoiceXMLVoiceXML grammargrammar<grammar><![CDATA[[[(query)] {<keyword "query">}[(match)] {<keyword "match">}[(boolean)] {<keyword "boolean">}[(estimation)] {<keyword "estimation">}[(selectivity)] {<keyword "selectivity">}[(optimize)] {<keyword "optimize">}[(tuple)] {<keyword "tuple">}[(operator)] {<keyword “operator">}[(application)] {<keyword “application">}[(substring)] {<keyword “substring">}[(alphanumeric)] {<keyword “alphanumeric">}[(attribute)] {<keyword “attribute">}[(approximate)] {<keyword “approximate">}]]]></grammar>
14
PRINCIPLES OF VIRD VUI DESIGNPRINCIPLES OF VIRD VUI DESIGN
Continuous Representation – making the system’s capabilities apparent to the user as a
reminder at any point in the dialogues Immediate Impact – immediate, implicit confirmation must be provided
Incrementality– a sense of continuity and natural flow in the conversation
between the system agent and the user Summarization and Aggregation – the results must be condensed for audio-only interfaces due
to the constraint imposed by auditory memory limitations
15
Diagram of VIRD VUIDiagram of VIRD VUI
Welcome Message
Main Menu Dialogue
Query Dialogue
Results Dialogue
Save Dialogue Confirm MyLibraryConfirm Email
16
VIRD Voice User InterfaceVIRD Voice User Interface
Main Menu DialogueContains four search functions: keyword, title, author
or year
Query DialogueAllows the user to say the words that will be used
during the search
17
VIRD Voice User Interface VIRD Voice User Interface (cont’d)(cont’d)
Results DialogueVoice Navigator: Presents the list of retrieval
documents to the user through a list of voice command: NEXT, PREVIOUS, STOP, REPEAT, DETAIL, TRY AGAIN or SAVE
Save DialogueAllows the user to request a copy of the article via
email or library
18
ExperimentExperimentParticipants– Twenty Computer Science graduate and senior undergraduate
students in a User Interface Design course participated in this experiment at Auburn University (ten female, ten male)
Procedures– Came in, used the same telephone, sit in the same chair, in the
same room with the experimenter (as an observer) – read a one-page instruction sheet – interacted with the VIRD system to complete a task based on the
task scenario. – Task scenario: “You are working on a research paper for Dr. X’s
database course. Your research topic is XML. Dr. X wants you to find a document on the subject tree algebra for XML using the system. When you find the document, use the save option to let the system email it to you”
– filled out a survey giving subjective evaluation of the system’sperformance
19
Evaluation Evaluation MethodologyMethodology
Measuring user satisfaction of the voice user interface for the document retrieval systemPARADISE framework [1]
20
Evaluation Methodology Evaluation Methodology (cont’d)(cont’d)
Maximize user satisfaction
Maximize task success
Minimize costs
Efficiency measures
Qualitative measures
21
Evaluation Metrics Evaluation Metrics
The first set:– Task success– Dialogue efficiency – Dialogue qualitative
The second set:– Completion– Inaccuracy – Misinterpretation
22
Evaluation ResultEvaluation Result
Metrics ComparisonMetrics Comparison
Metrics Comparision Chart
86.50% 89.50%81% 85%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Task Success Dialogue Efficiency Dialogue Qualitative User Satisfactory
metrics
perc
ent
Series2
Series1
23
Evaluation Result (cont’d)Evaluation Result (cont’d)Time of CompletionTime of Completion
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Subjects
Seco
nds
Series1
24
Evaluation Result (cont’d)Evaluation Result (cont’d)MisinterpretationMisinterpretation
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Subjects
Tim
es
Series1
25
Experiment SummeryExperiment Summery
Even though the misinterpretation rate is high, the user satisfaction is still high, this means, the user will accept the errors as long as they can recover from the errors easilyA potential flaw in PARADISE
Maximize user satisfaction
Maximize task success
Minimize costs
Efficiency measures
Qualitative measures
26
Future WorkFuture Work
Investigate Spoken Query Retrieval for Large Documents (Yapin’s research)Investigate a new usability model for Voice User Interface (Priyanka’s research)
27
DemoDemo
28
QuestionsQuestions
29
ReferencesReferences
1. C.A.Kamm & M.A.Walker. Design and evaluation of spoken dialog systems. In Proceedings of the ASRU Workshop, 1997.