Download - SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

Transcript
Page 1: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeGA  Mul&modal  Speech-­‐  and  

Gesture-­‐based  Text  Input  Solu&on

Lode  Hoste,  Bruno  Dumas  and  Beat  Signer

Page 2: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 2

Text-input for set-top boxes

Page 3: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 3

Page 4: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 4

Page 5: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 5

Text-input for set-top boxes

Page 6: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter

1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller

6

Page 7: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Virtual keyboard

7

Page 8: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Kinect 1D keyboard

8

Page 9: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Kinect 1D keyboard

9

Page 10: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter

1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller

10

Page 11: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

8PenSwiftKey

Speech Dasher SpeeG

EdgeWriter

1D Keyboard for Kinect Virtual Keyboard for XboxChatpad Controller

11

Page 12: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

12

Continuous inputJoystick / Gaze / ...Open vocabularyAllows imprecise navigation

Page 13: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Dasher

13

Page 14: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Controller-freeText inputWithout training

14

KinectCMU SphinxDasher

Used technologies:Goals:

Page 15: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

SpeeG

15

Page 16: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 16

Page 17: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

SpeeG Architecture

User

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

17

Page 18: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Evaluation

18

SpeeGUser

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3Speech-only

Virtual Keyboard Kinect Keyboard

Page 19: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Evaluation

“this was easy for us”“he will allow a rare lie”“did you eat yet”

“my watch fell in the water”“the world is a stage”“peek out the window”

19

7 (male) users: 23-31y

1-3: DARPA’s TIMIT

Performed a quantitative (Words per minute and nr of errors) and qualitative (feedback and preference) evaluation

4-6: MacKenzie and Soukoreff

show 2 about ‘expertise of users’

Page 20: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Virtual keyboard

20

6.3 WPM

Page 21: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Kinect Keyboard

21

*

1.83 WPM

Page 22: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

5

10

15

20

25

30

35

40

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 1

User 2

User 3

User 4

User 5

User 6

User 7

Speech-only

22

User

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

11 WPM

Page 23: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 2

User 1

User 3

User 4

User 5

User 6

User 7

SpeeG

23

5.8 WPM

Page 24: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

WPM

Sentence

User 2

User 1

User 3

User 4

User 5

User 6

User 7

SpeeG

24

2.6 7.8 WPM

Page 25: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

0

5

10

15

20

25

S1 S2 S3 S4 S5 S6

WPM

Sentence

Controller

Speech only

Kinect only

SpeeG

Mean WPM per sentenceand input device

25

SpeeG

1D Keyboard for XboxVirtual Keyboard for Xbox

Speech-onlyUser

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

Page 26: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 26

0

1

2

3

4

5

6

7

8

9

10

S1 S2 S3 S4 S5 S6

Mea

n nu

mbe

r of e

rror

s

Sentence

Controller Speech only Kinect only SpeeG

SpeeG

1D Keyboard for XboxVirtual Keyboard for Xbox

Speech-onlyUser

1

GUI (JDasher)

Speech Recogniser(CMU Sphinx 4)

Hand Tracking(Microsoft Kinect and NITE)

5

42

3

Errors per sentenceand input device

Page 27: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 27

Page 28: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Future work

28

Other visualisations Smaller gesturesDedicated commands (gesture / voice)

Page 29: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel 29

Page 30: SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

SpeeG - Lode HosteVrije Universiteit Brussel

Kinect

- Controller-free text input- Real-time correction- Dasher, zoomable interface - probabilities - alphabetic order - character-level

SpeeGA  Mul&modal  Speech-­‐  and  

Gesture-­‐  based  Text  Input  Solu&on Lode  Hoste,  Bruno  Dumas,  Beat  Signer

Speech

- Non-native speakers- Untrained voice recogniser- 6-12 WPM- Perceived fastest- Game-like character- Novice and experts

30Special thanks to Jorn De Baerdenmaeker and Keith Vertaenen