Ubiquitous Computing Max Mühlhäuser, Iryna Gurevych (Editors) Part V : Ease-of-use Chapter 18:...

Ubiquitous Computing

Max Mühlhäuser, Iryna Gurevych (Editors)

Part V : Ease-of-useChapter 18: Mouth-and-Ear Interaction

Dirk Schnelle

2

UbiquitousComputing

Mouth-and-Ear Interaction:

Introduction

• Mobile workers– Need information to perform their task– Have their hands busy with their task– Look at what they are doing

• Problem– Hands & Eyes devices hinder them performing their task

• Solution– Audio channel

• Advantages of audio– Can be used without changing focus– Can be used in parallel– Still functional under extreme cases

• Darkness• Limited freedom of movement

– Can be used in addition to other modalities• Disadvantages of audio

– Unusable in noisy environments

3

UbiquitousComputing


Voice is a natural way of communication?

“Reports and theses in speech recognition often begin with a cliché, namely that speech is the most natural way for human beings to communicate with each other''

(Melvyn Hunt, 1992)

• Speech is one of the most important means of communication• Speech has been an efficient means of communication throughtout the

history of mankind

• Is this still valid for the computer as the counterpart?

• This vision requires active and passive communication competence of a human

• Advances in Natural Language Processing are first steps towards this vision

• Today– Still a vision– E.g. Banking companies observe a tend to graphical over voice based use of

their interfaces

4

UbiquitousComputing


Mixed Initiative

Dialog Strategies

• Main concepts of voice based interaction– Command & Control– Menu Hierarchy– Form based– Mixed Initiative

• Voice based interaction must be– Easy to learn– Easy to remember– Natural– Acoustically different

User Initiative

System Initiative

5

UbiquitousComputing


Definitions

• Voice User Interfaces (VUI)s are user interfaces using speech input through a speech recognizer and speech output through speech synthesis.

• Audio User Interfaces (AUI)s are an extension to VUIs, allowing also the use of sound as a means to communicate with the user.

• Today– The term VUI is used in the sense of AUI

6

UbiquitousComputing


Domain Specific Development

• Applications have to be integrated into the domain– Business context– Infrastructure

• Manual adaption is– Labor intensive– Requires heavy involvement of experts

• Vocabulary• Grammar• Semantic

High costs

7

UbiquitousComputing


Attempts to reduce costs

• Speech contexts– Modular solutions for handling specific tasks in the dialog– E.g. widgets for entering time and date for a recurring

appointment

– Can be used in RAD tools to develop applications

8

UbiquitousComputing


Speech Graffitti

• Universal Speech Interface by Carnegie Mellon University

• Main idea– Transfer the look&feel from the

world of graphical interfaces to audio

• Compromise of– Unconstrained NLU– Hand crafted menu hierarchieas as

they are used in todays telephony applications

• ExampleSystem: …movie is Titanic … theatre

is the Manor …User: What are the show times?

– „what are“ is standardized– „show times“ still domain specific

Speech Graffiti is not self contained

9

UbiquitousComputing


Size of vocabulary

• Application domain has a strong impact onto the size of the vocabulary

Domain Vocabulary size (words)

Dialog strategy

Alarm system(alarm command)

1 command & control

menu selection(Yes/No)

2 command & controlmenu hierarchy

Digits 10+x menu hierarchy

Control 20-200 command & control

Information Dialog 500-2,000 menu hierarchyform filling

Daily talk 8,000 - 20,000 mixed initiative

Dictation 20,000 - 50,000 n/a

German (no foreign words)

ca. 300,000 n/a

10

UbiquitousComputing


STAIRS

• STAIRS– Structured Audio Information Retrieval System– Aims at delivering information to mobile workers

• Possible scenarios– Pick by voice– Musuem Guide– training@work

• Key features– Audio output– Use of contextual informaton– Command set

11

UbiquitousComputing


STAIRS - Audio Output

• Structuring information– no problem in visual media– Helps the user to stay oriented– Allows for fast and easy access to the desired information

• Structural meta data gets lost in audio

Structured audio• Background sounds• Auditory icons• Earcons

12

UbiquitousComputing


STAIRS – Context information

• Location is the most important context– Absolute positioning world model needed– Relative positioning based on tags

• Artifacts of the world layer are starting points for browing in the layer

• Mobile Worker moves in – The physical layer– Document layer

13

UbiquitousComputing


STAIRS – Command Set

• Problem– Embedded devices allow only

few commands

• Observation– Only few commands are needed

to control a browser– Allows to handle complex

applications

transfer this concept to audio

14

UbiquitousComputing


STAIRS – Minimal Command Set

• Audio commands must be– Intuitive– Easy to learn– Acoustically different

• ETSI standardized several command sets for different scenarios

• Survey– Result close to ETSI– Some commands are missing

Function Command

Provide information about current item

Details

Cancel current operation Stop

Read prompt again Repeat

Go to next node or item Next

Go to previous node or item Go back

Go to top level of service Start

Go to last node or item Last

Confirm operation No

Reject operation No

Help Help

Stop temporarily Pause

Resume interrupted playback

Continue

15

UbiquitousComputing


Additional Challenges with UC

Inherent to audio

Technical problems

Challenges with audio

Speech recognition performanceSpeech Synthesis qualityRecognition: Flexibility vs. AccuracySpeech is transientInvisibilityAssymetryOne-dimensionalityConversationPrivacyService Availability

16

UbiquitousComputing


Speech Recognition Performance

• Speech is not recognized with an accuracy of 100%(even humans can‘t do that)

• People are used to graphical user interfaces with a recognition rate of 100%

• Sentences are NOT only sequences of words and words being sequences of phonemes

Plain text

Spontaneous speech

Spontaneous speech without interpunctuation

Contiuous speech

Variances

Articulator slurring

Channel problems

Cocktail party effect

17

UbiquitousComputing


Speech is…

• Continuous• Flexible

– Inherent to the audio channel– Acoustic problems

• Complex• Ambiguous

18

UbiquitousComputing


Speech synthesis quality

• Quality of modern speech synthesizers is still low• In general people prefer to listen to prerecorded audio

Conflict• Texts have to be known in advance

• Texts must be prerecorded

• But: Not all texts are known in advance

Current solutions• Audio snippets are prerecorded and pasted together as needed• Professional speakers are able to produce voice with a similar

pitch• Humans are very sensitive in listening and are able to hear small

differences

19

UbiquitousComputing


Flexibility vs. accuracy

• Speech can have many faces for the same issue• Natural language interfaces must serve them all

Example task: Enter a date• Flexible interface

– Allow the user to enter the date in any format• „March 2nd“, „Yesterday“, „2nd of March 2004“, …

requires more work by the recognition softwareerror pronepreferred by users

• Restrictive interface– Prompt the user individually for each component of the date

• „Say the year“, „Say the month“,…less work required from the recognition softwareless error prone

20

UbiquitousComputing


Medium inherent challenges

• One dimensionality– Eye is an active organ, but the ear is passiveThe ear cannot browse a set of recordings as the eye can browse a se of

documents

• Transience– Listening is controlled by the STM (short term memory)– Users forget what was said in the beginningAudio is not ideal to deliver large amounts of dataUsers get lost in VUIs (lost in space problem)

• Invisibilty– VUIs must indicate the user what to say and what actions to performLeave the impression of not being in controlAdvantage: Audio can be received without the need to switch context

• Assymetrie– People speek faster than they type but listen more slowly than they can

readCan be exploited in multi-modal scenarios

21

UbiquitousComputing


Challenges for UC

• Conversation– Scenario: User speaks to another person– Risk of triggering unwanted commands is higher than in noisy

environments

• Privacy– Audio input and output should be hidden from others– Impossible to avoid– Only solution: Do not deliver critical or confidental information

• Service availability– Speech dependent service may become unavailable while the

user is on the movemay have an impact on the vocabulary that can be used– User has to be notified about the change

22

UbiquitousComputing


Error Handling

• Possible causes for errors– Use of an out-of vocabulary word– Use of a word that cannot be used in the current context– Recognition error– Background noise– Incorrect pronounciation (e.g. no native speaker)

• Applications may detect errors but without knowing the source

• Ways to deal with errors– Do nothing and let the user find and correct the error– Prevent the user from continuing– Complain and warn the user using a feedback mechanism– Correct it automatically without user interaction– Initiate a procedure to correct the error with the user‘s help through

Users will refrain from working with the applicationbetter error prevention

23

UbiquitousComputing


Error prevention

• Reduce the probability of misrecognized words– Limit background noise (Noise cancelling head sets)– Allow the user to turn off the device (push-to-talk)

24

UbiquitousComputing


Lost in space

• A user gets lost in space if she does not know where in the graph she is

Possible Solutions to avoid lost in space• Calling

– Someone calls you from somewhere and you can head in the direction to find her

• Telling– Someone next to you gives you directions

• Landmarks– You hear a familiar sound and you can use it to orient yourself

and work out where to head

25

UbiquitousComputing


General solutions

• Guidelines– Try to capture design knowledge

into small rules

• Guidelines have disadvantages– Often too simplistic or too abstract– Often have authority issues

concerning their validity– Can be conflicting– Can be difficult to select– Can be difficult to interpret

Top 10 VUI No-No’s• Don't use open ended prompts• Make sure that you don't have

endless loops• Don't use text-to-speech unless you

have to• Don't mix voice and text-to-speech• Don't put into your prompt

something that your grammar can't handle

• Don't repeat the same prompt over and over again

• Don't force callers to listen to long prompts

• Don’t switch modes on the caller• Don’t go quiet for more than 3

seconds• Don’t listen for short words

http://www.angel.com/services/vui-design/vui-nonos.jsp

26

UbiquitousComputing


VUI pattern language

• Patterns– Capture design knowledge

• Form• Context• Solution

• Structure– Preconditions– Postconditions

• Current trend– Transfer guidelines into patterns

27

UbiquitousComputing


Future research directions

• Challenges about audio are still not completely soved• Current solutions

– Guidelines– Patterns

• Patterns seem to be more promising

• More pattern mining needed• Tools to apply the patterns• Use patterns in code generators from modality

independent UI languages (e.g. XForms)

Ubiquitous Computing Max Mühlhäuser, Iryna Gurevych (Editors) Part V : Ease-of-use Chapter 18:...

Documents

Transcript of Ubiquitous Computing Max Mühlhäuser, Iryna Gurevych (Editors) Part V : Ease-of-use Chapter 18:...