Evaluation of a multimodal Virtual Personal Assistant Glória Branco
description
Transcript of Evaluation of a multimodal Virtual Personal Assistant Glória Branco
![Page 1: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/1.jpg)
1ª Reunião Acompanhamento - BRIDGE1
Evaluation of a multimodal Virtual Personal Assistant
Glória Branco
Sophie-Antipolis, March 23, 2006
20th International Symposium 2006
![Page 2: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/2.jpg)
2
Agenda
• Introduction– FASiL project and consortium– The Virtual Personal Assistant (VPA)
• Architecture• Functionalities• Interface
– Global Evaluation Methodology• Heuristic Evaluation• User Trials
• The Portuguese trials– Method– Results– Users comments
• Conclusions
![Page 3: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/3.jpg)
3
FASiL Project
• FASiL – “Flexible and Adaptive Multi-Modal Spoken Interface Language” – EU-IST funded, multimodal, multi-lingual, conversational
application to e-mail management.
• Objectives– “...to pilot a full multi-modal voice portal application that is
3G mobile network ready, along with tools for rapid development of new applications. FASiL targets the languages of UK English, Portuguese and Swedish… [with] intelligent, friendly adaptive multi-modal interaction.”
![Page 4: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/4.jpg)
4
FASiL Consortium
generation
Inovação
P TInovação
FASiL: “Flexible and Adaptive Multi-Modal Spoken Interface Language”
![Page 5: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/5.jpg)
5
VPA Architecture
ASR Multilingual
TTS
Vox Generator Services
Fission
Mid
dle
we
re
PIM
Administ
rtion
Multi-Modal Gateway
Fusion
Dialogue Manager
GUI Gateway
![Page 6: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/6.jpg)
6
VPA Funcionalities
• Hear a summary of the Inbox.• Navigation: next, previous.• Select specific e-mails : search by
State (new, old), Sender, Date, Priority and Category.
• Read, compose, reply, forward and delete e-mails.
• Recipient list management.• Summarisation.• Categorisation.
![Page 7: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/7.jpg)
7
VPA Interface
• Output – Voice– Avatar– Screen– PDA
• Input – Voice– Keyboard– Mouse– Touch– Stylus
Multimodal
VUI
GUI
Available in English, Swedish and Portuguese
![Page 8: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/8.jpg)
8
Global Evaluation
• Set up of test environment– Task design, to cover the VPA functionalities. – Test mailbox populated with a restricted set of contacts and
emails.
• Heuristic Evaluation– 5 expert assessments by each language. – Experts in accessibility, usability and voice interaction.
• User Tests– 20 users for accessibility only for the English version (RNIB and
RNID)– 20 Swedish and English users and 12 Portuguese. – Experts in email usage.
“to iteratively gather information about the usability and accessibility of the system”
![Page 9: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/9.jpg)
9
The Portuguese Trial• Laboratory environment
– the graphic interface was a web-based page, simulating a mobile phone. The users used a desktop PC with Internet access to interact with the GUI and a fixed phone to convey voice to the system.
• 12 native Portuguese speakers – 8 males and 4 females – from 19 to 46 years (mean 30,6 years) – 75% of the participants had high-level education and 16,7 % had mid-
level education – ICT domain professionals and experienced e-mail users.
• 5 typical e-mail tasks– login and browsing mailbox– search for and reply to an e-mail– search and forward– administer and manage the recipient list – finding, reply and deleting an e-mail
![Page 10: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/10.jpg)
10
Task results summary
Task
Time Comp. %
VUI GUI. Correct resp.
No resp.
Misund. Incor. resp
T1 10(7-18)
83,3 14,0 (9-49)
6,5 (0-22)
54,3 13,7 6,8 25,2
T2 7(5-12)
75 11,5 (5-33)
4,5(0-16)
60,2 11,4 8,5 19,9
T3 5(3-16)
100 14,5(1-26)
1,5(0-20)
67 9,8 9,8 13,3
T4 6(2-10)
50 16;5(4-41)
2(0-6)
55,84 7,4 18,6 18,2
T5 10(4-18)
75 32(9-66)
4(0-11)
59,4 9,4 9,4 21,8
Interactions Spoken interaction (%)
![Page 11: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/11.jpg)
11
Post-test satisfaction questionnaire
0
1
2
3
4
5
6
7
8
Intu
itiven
ess
Easy
Confid
ence
Satisf
actio
n
whe
re&abo
uts
erro
reco
g
prom
pts
emai
ls
Conve
rsat
ions
Very Sat Satisf ied Neutral Unsatisf ied Very Unsat.
Frequeci
e
![Page 12: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/12.jpg)
12
Statistical analysis
• Significant correlation (Spearman’s correlation coefficient) between the overall satisfaction and: – Quality of dialog: = 0,87 – Confidence: = 0,79– Easy of use: = 0,74– Interaction control: = 0,73 – Interaction quality (error recognition): = 0,69
• Significant correlation (Spearman’s correlation coefficient) between the overall satisfaction (subjective) and the concept accuracy (objective value of correct responses): = 0,85.
• No differences between females and males (Mann-Whitney test) as well as between the experimented or naïve users.
![Page 13: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/13.jpg)
13
Users aproach
• The preferred modality was speech.• Natural language, using short phrases
but with complex commands.• Speech input to convey the
commands and graphical interface to read the messages and to scroll quickly through the contacts list.
• More intensive use of the GUI to overcome the recognition problems and slowness of the system response.
• Mixed initiative dialog.
![Page 14: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/14.jpg)
14
Interaction Example 1
U I want replace [recipient name] by carbon copy.
S Who would you like to send copy to?
U (barge-in) [recipient name] S Send copy to [recipient
name] U I want change the recipient
list.
![Page 15: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/15.jpg)
15
Interaction Example 2
• U mailbox • S You have 4 e-
mails • U New search. Find
high priority messages from [recipient name]
• S You have 1 new priority e-mail
from [recipient name] • U Read it
![Page 16: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/16.jpg)
16
Users apreciation
• The conversational and multimodal VPA concept was attractive to all users and was seen as a key enabler supporting the growing user mobile attitude.
• The VPA was seen as easy to use and intuitive. The Help part of the system was almost not used.
• Users did not liked excessive confirmations.• The use of the TTS Portuguese voice was
well accepted by the users. • Users liked voice-in and VUI and GUI-out in a
small-screen environment. • The multimodality was seen as a very good
capability to overcome recognition problems encountered in the VUI.
![Page 17: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/17.jpg)
17
Future Use
But, when asked about the future use
• 58% of the users said that they would not use the system in its current form.
• Main reasons:– slow response time– recognition/understanding problems.
![Page 18: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/18.jpg)
18
Failure?
Tell me “when it’s time” to stop!
![Page 19: Evaluation of a multimodal Virtual Personal Assistant Glória Branco](https://reader036.fdocuments.us/reader036/viewer/2022062518/5681438d550346895db00ab9/html5/thumbnails/19.jpg)
19
NO!
Lessons learned– Speed of feedback is very important. Users
dislike latency latency or long periods of silence. – Improvements are needed to increase the
recognition accuracy of the spoken components.– Natural language is working ... with limitations.
Multimodal interfaces can overcome the weaknesses of each modality and exploit the full strengths of combined modes.