USING MICROSOFT KINECT SENSOR TO PERFORM COMMANDS … · Microsoft Kinect device is at the moment...

Master in computer science

USING MICROSOFT KINECT SENSOR

TO PERFORM COMMANDS ON

VIRTUAL OBJECTS

Final Project

2012

Author & Student: Supervisors:

Simon Brunner Denis Lalanne

Matthias Schwaller

Final Project Paper 2 02.10.2012


Abstract

Gestural user interfaces are becoming omnipresent in our daily lives with more or less success. The

Microsoft Kinect device is at the moment on top of all hand free recognition devices. Gestures are

mostly designed for leisure applications such as mini-games requiring standing up in large

environment, far from the screen. The Kinect sensor device for windows reduces the distance for

desktop users with the near mode. This project aims at studying the possibility to develop subtle

gestures to perform basic gestural interactions to operate a computer with accuracy. Two different sets

of gestures have been designed and compared during the course of this project to work on close range.

The goal is to use them in small repetitive tasks and evaluate their performances against each other to

see if the one or several types of gestures work better than others. The designed gestures operate four

commands that are evaluated with users: selection, drag and drop, rotation and resizing of objects.

The first part of this master project paper presents the technological features and how to go from data

acquisition from Kinect to the creation of functional gestures. The following part concerns the design

of the gestures. A description on all the designs with their pros and cons is provided in tables showing

the evolutions of the gestures. The gestures are divided in two groups: the technological and the iconic

ones. On one hand, the technological gestures aim at efficiency and reliable recognition regardless of

the users' expectations. On the other hand, the iconic gestures aim to be efficient but priority is given

to their naturalness, easiness to remember, and ergonomics for users.

Another important part of this project concerned the creation of a full application for testing with users

each gesture in four simple activities and then all together by groups in a final activity. This report

ends with the results of a within-subject user evaluation organized with 10 persons and their analysis.

Results show that iconic selection has quantitatively equivalent performances as the technological one

but is perceived as more comfortable and usable by users. Further, the iconic zoom and rotation have

significantly better results according to statistical tests. Finally, iconic gestures are individually better

and/or favored by users over the technological gestures, which had similar performances during the

final tasks regrouping the four commands.


Table of Contents

Abstract ..............................................................................................................................................3

Table of Contents ................................................................................................................................5

List of Figures ....................................................................................................................................9

List of Tables .................................................................................................................................... 10

List of Graphs ................................................................................................................................... 11

Glossary ........................................................................................................................................... 13

1. Introduction .............................................................................................................................. 16

1.1 Context .............................................................................................................................. 16

1.2 Goal .................................................................................................................................. 16

1.3 Constraints ........................................................................................................................ 16

1.4 Thesis structure ................................................................................................................. 16

2. Technology ............................................................................................................................... 20

2.1 Kinect Sensor for Windows ............................................................................................... 20

2.2 The Depth Sensor .............................................................................................................. 21

2.3 The Near mode .................................................................................................................. 22

2.4 Kinect SDK ....................................................................................................................... 22

2.5 Choice of the library .......................................................................................................... 23

2.6 Candescent NUI ................................................................................................................ 23

3. Using Candescent NUI .............................................................................................................. 26

3.1 Bases to start with Candescent NUI ................................................................................... 26

3.2 Kinect field of view ........................................................................................................... 26

3.3 Hands detection ................................................................................................................. 27

3.4 Fingers detection ............................................................................................................... 27

3.5 Wrong fingers detection ..................................................................................................... 27

3.6 Finger count Stabilizer ....................................................................................................... 27

3.7 Close hands ....................................................................................................................... 28

3.8 Layers and feedback .......................................................................................................... 28


4. Gestures Design ........................................................................................................................ 32

4.1 Context .............................................................................................................................. 32

5. Gesture recognition Development.............................................................................................. 36

5.1 Iconic gestures 1 ................................................................................................................ 36

5.1.1 Description ................................................................................................................ 36

5.1.2 Technical and Operational Advantages & disadvantages ............................................ 37

5.1.3 Quick Summary ......................................................................................................... 38

5.2 Iconic gestures 2 ................................................................................................................ 39

5.2.1 Description ................................................................................................................ 39


5.2.2 Quick Summary ......................................................................................................... 41

5.3 Iconic gestures 3 ................................................................................................................ 41

5.3.1 Description ................................................................................................................ 41


5.3.2 Quick Summary ......................................................................................................... 42

5.4 Iconic gestures 4 ................................................................................................................ 42

5.4.1 Description ................................................................................................................ 42


5.4.3 Quick Summary ......................................................................................................... 44

5.5 Technologic gestures 1 ...................................................................................................... 44

5.5.1 Description ................................................................................................................ 44


5.5.3 Quick Summary ......................................................................................................... 46


5.6.1 Description ................................................................................................................ 46


5.6.3 Quick Summary ......................................................................................................... 47


5.7.1 Description ................................................................................................................ 47



5.7.3 Quick Summary ......................................................................................................... 48


5.8.1 Description ................................................................................................................ 48


5.8.3 Quick Summary ......................................................................................................... 49

6. The selected gestures for the evaluation ..................................................................................... 52

6.1 The final choice ................................................................................................................. 52

6.2 The Iconic choice .............................................................................................................. 52

6.3 The technologic choice ...................................................................................................... 52

7. The test application ................................................................................................................... 54

7.1 The description of the application ...................................................................................... 54

7.2 The Levels ......................................................................................................................... 55

7.2.1 Training ..................................................................................................................... 56

7.2.2 Selection .................................................................................................................... 56

7.2.3 Move ......................................................................................................................... 56

7.2.4 Rotation ..................................................................................................................... 57

7.2.5 Resizing ..................................................................................................................... 57

7.2.6 Final .......................................................................................................................... 57

7.3 Editable levels ................................................................................................................... 58

7.4 The general interface ......................................................................................................... 59

7.4.1 The pointer ................................................................................................................ 59

7.4.2 The commands ........................................................................................................... 60

7.5 The layers .......................................................................................................................... 60

7.6 The objects ........................................................................................................................ 61

7.7 The targets ......................................................................................................................... 61

7.8 The feedbacks.................................................................................................................... 62

7.9 The animations .................................................................................................................. 65

7.10 The measures ..................................................................................................................... 66


7.11 The logs ............................................................................................................................ 66

7.12 About the applications ....................................................................................................... 67

7.13 The Class diagram ............................................................................................................. 68

8. The Evaluation .......................................................................................................................... 70

8.1 Conditions of the evaluation .............................................................................................. 70

8.2 Pre-evaluation ................................................................................................................... 71

8.3 Range of testers ................................................................................................................. 72

8.4 The questionnaire .............................................................................................................. 72

8.5 Results .............................................................................................................................. 73

8.5.1 Selection .................................................................................................................... 73

8.5.2 Move ......................................................................................................................... 77

8.5.3 Rotation ..................................................................................................................... 80

8.5.4 Resizing ..................................................................................................................... 83

8.5.5 Final .......................................................................................................................... 85

8.5.6 Summaries ................................................................................................................. 87

8.5.7 Questionnaire ............................................................................................................. 88

8.6 Analysis ............................................................................................................................ 89

9. Extra applications ...................................................................................................................... 94

9.1 Gesture Factory application ............................................................................................... 94

9.2 Bing Map application ........................................................................................................ 94

10. Conclusions .............................................................................................................................. 96

10.1 General comments over the development ........................................................................... 96

10.2 Conclusion ........................................................................................................................ 96

10.1 Future work ....................................................................................................................... 97

11. References ................................................................................................................................ 99


List of Figures

Figure 1: Kinect Specs ...................................................................................................................... 20

Figure 2: IR light dots ....................................................................................................................... 20

Figure 4: light coding 2D depth images ............................................................................................. 21

Figure 3: PrimeSensor's objective ..................................................................................................... 21

Figure 5: Near mode vs Default mode ............................................................................................... 22

Figure 6: Candescent NUI ................................................................................................................. 24

Figure 7: Candescent NUI area of detection ...................................................................................... 26

Figure 8: Fingers' order: 1: Red, 2: Blue, 3: Green, 4: Yellow, 5: Pink ............................................... 27

Figure 9: Hands getting too close from each other ............................................................................. 28

Figure 10: Zoom, Rotation & Selection of Iconic 1 ........................................................................... 36

Figure 11: Full circle detections ....................................................................................................... 38


Figure 13: horizontal hand not detected ............................................................................................. 40

Figure 14: Circle movement detected ................................................................................................ 40

Figure 15: Selection for Iconic gestures 3 .......................................................................................... 41


Figure 17: rope technique angle example .......................................................................................... 43

Figure 18: Thumb click operations .................................................................................................... 45

Figure 19: Thumb click state diagram ............................................................................................... 45

Figure 20: Zoom technologic 2 ......................................................................................................... 46

Figure 21: Progressive zoom from technologic 3 .............................................................................. 47

Figure 22: Progressive zoom with steps ............................................................................................. 48

Figure 23: Progressive zoom examples.............................................................................................. 48

Figure 24: Zoom, Rotation & Selection of Technologic 4 .................................................................. 48

Figure 25: Chosen gestures for Iconic ............................................................................................... 52

Figure 26: Chosen gestures for Technologic ...................................................................................... 52

Figure 27: Setup windows form ........................................................................................................ 54

Figure 28: Progression of the level path............................................................................................. 55

Figure 29: Panel of the 6 activities .................................................................................................... 55

Figure 30: Appearance order on circle ............................................................................................... 56

Figure 31: move level demonstration................................................................................................. 57

Figure 32: rotation level demonstration ............................................................................................. 57

Figure 33: Resizing level demonstration ............................................................................................ 57

Figure 34: Final level demonstration ................................................................................................. 58

Figure 35: xml piece example ........................................................................................................... 59


Figure 36: Layers classes structure .................................................................................................... 60

Figure 37: layers' levels structure ...................................................................................................... 60

Figure 38: Demonstration zoom and rotation on a composite object .................................................. 61

Figure 39: Target example ................................................................................................................ 62

Figure 40: Left hand feedbacks ......................................................................................................... 62

Figure 41: detection limits feedbacks ................................................................................................ 63

Figure 42: Feedback of the frame for the detection area ..................................................................... 63

Figure 43: Text & button feedback .................................................................................................... 64

Figure 44: Technologic feedbacks ..................................................................................................... 64

Figure 45: Iconic feedbacks .............................................................................................................. 64

Figure 46: Object's status color feedback ........................................................................................... 65

Figure 47: Object's middle dot feedback ............................................................................................ 65

Figure 48: Application class diagram ................................................................................................ 68

Figure 49: Questionnaires for evaluation ........................................................................................... 72

Figure 50: Index of performance formulas ......................................................................................... 77

Figure 51: Gestures factory & Bing map application ......................................................................... 94

List of Tables

Table 1: Selection gestures design ..................................................................................................... 32

Table 2: Zoom gestures design .......................................................................................................... 33

Table 3: Rotation gestures design ...................................................................................................... 34

Table 4: Iconic gestures 1 summary .................................................................................................. 39



Table 7: Technologic gestures 1 summary ......................................................................................... 46



Table 10: Technologic gestures 4 summary ....................................................................................... 49

Table 11: within subjects experiment progress .................................................................................. 71

Table 12: Selection: times t-test table ................................................................................................ 76

Table 13: Selection: tries t-test table .................................................................................................. 76

Table 14: indexes of difficulty with average throughputs ................................................................... 77

Table 15: Move: times t-test table ..................................................................................................... 80

Table 16: Move: tries t-test table ....................................................................................................... 80

Table 17: Rotation: times t-test table ................................................................................................. 82

Table 18: Resizing: times t-test table ................................................................................................. 85


Table 19: Final: times t-test table ...................................................................................................... 87

Table 20: Final: tries and errors t-test table ........................................................................................ 87

Table 21: Activities summaries table ................................................................................................. 88

Table 22: Activities errors summaries table ....................................................................................... 88

Table 23: Questionnaires' results table............................................................................................... 89

List of Graphs

Graph 1: Average time for each task from Technologic and Iconic side by side ................................. 74

Graph 2: Selection: average time's comparison by task ...................................................................... 74

Graph 3: Selection: average time's comparison by tester .................................................................... 74

Graph 4: Histogram + Distribution curves + densities........................................................................ 75

Graph 5: Box plot Selection Technologic and Iconic ......................................................................... 76

Graph 6: Selection: number of tries by testers ................................................................................... 76

Graph 7: Average time for each task from Technologic and Iconic side by side ................................. 78

Graph 8: Move average time's comparison by task ............................................................................ 78

Graph 9: Move average time's comparison by tester .......................................................................... 78

Graph 10: Histogram + Distribution curves + densities ...................................................................... 79

Graph 11: Box plot Move Technologic and Iconic ............................................................................ 79

Graph 12: Move: number of tries comparison .................................................................................... 80

Graph 13: Rotation: average time's comparison by task ..................................................................... 80

Graph 14: Rotation average time's comparison by tester .................................................................... 81


Graph 16: Box plot Rotation Technologic and Iconic ........................................................................ 82

Graph 17: Rotation: number of tries comparison ............................................................................... 82

Graph 18: Resizing: average time's comparison by task ..................................................................... 83

Graph 19: Resizing: average time's comparison by tester ................................................................... 83


Graph 21: Box plot Resizing Technologic and Iconic ........................................................................ 84

Graph 22: Resizing: number of tries comparison ............................................................................... 85

Graph 23: All times for each task from Technologic and Iconic side by side ...................................... 85

Graph 24: Final: average time's comparison by tester ........................................................................ 86


Graph 26 Final: number of tries comparison ...................................................................................... 87


Glossary

NUI: natural user interface

SDK: software development kit

SoC: system on a chip

Continuous movement: the movement of the gesture can be done indefinitely without repositioning

to an initial position as long as the command is active. Opposite of limited

movement

Limited movement: movement with a physical limit, like stretching the arm to its maximum. It

requires releasing and repositioning to extend unfinished movements.

Progressive movement: movements based on the position difference. The execution follows the

direct movement progressively.


Chapter 1

Introduction

1.1 Context.......................................................................................... 16

1.2 Goal............................................................................................... 16

1.3 Constraints..................................................................................... 16

1.4 Thesis structure..............................................................................16

This first chapter is an introduction. It treats the context and the reason of the

projects. It also gives an overview of how the thesis is structured, what its

subjects treated are.


1. Introduction

1.1 Context

It's no surprise that for a few years now, natural user interfaces have become essential. Phones, tablets,

video games systems, TVs, they all use new ways to interact with machines, whether it's tactile,

gesture recognition or voice recognition. Kinect offers a new hand free experience by adding depth to

video processing. It allows creating new interfaces where the users no longer need to use any device.

Kinect has already proven its capability to point at objects with accuracy using only one hand.

Nevertheless, the commands provided by Kinect are limited to the selection. Indeed, the selection is

done by pointing to an object and wait a few seconds without moving to select it. This technique

works well. The pointer remains stable as no extra gesture is required but it limits it to one command

only. Adding new commands means bringing new gestures. These gestures alter the stability of the

pointer and make it hard to be accurate. The idea is to extend the pointing by adding some commands

controlled with the other hand only to keep the steadiness of the pointing.

1.2 Goal

The goal of the project is to explore and find multiple ways to perform common tasks on objects like

moving, rotating, enlarging, reducing, etc. Microsoft Kinect sensor device is the technology used for

this project. The idea is to use gesture recognition to do the manipulations. One hand handles the

pointing and the second hand take care of the commands. The focus of the project is on the second

hand performing the commands. Evaluation of various alternatives techniques will be done to estimate

their efficiencies over each others.

1.3 Constraints

We have decided to use the official library Microsoft Kinect SDK. As the focus of the project is on the

gestures recognition and not the hands and fingers detection, I was allowed to find and select a library

of my choice as long has it takes the kinect advantages. The commands must be executed only with

the left hand. The right hand is only on support to point on objects and to move them for the drag and

drop, nothing more. No other combinations between the hands are allowed.

1.4 Thesis structure

The thesis is divided in several chapters. Each chapter tries to bring the reader from the beginning of

the project to the end. First, as already read, the introduction puts the bases of the project with the

context, the objectives it tries to achieve and finally the few constraints that needs to be followed.


Then, the technologies used in this project are described; why they were chosen and how they work.

The candescent NUI is the library used to do the gesture recognition. The technologies are presented

with their technical aspects. Chapter 3 explains how the Candescent library works and what its

qualities and flaws are. Chapter 4 takes care of the design part. Each command is briefly described

with a sketch, a text explaining the functioning and the pros and cons of the design from the user's

point of view and the developer's point of view. The next chapter treats each developed commands in a

more detailed and technical way with problems and solutions found during development. The

evaluation can't test all the commands; it would take too much time. A few commands had to be

chosen. Chapter 6 tries to justify those choices. The seventh chapter is all about the evaluation

application: the interface, the level mechanics, the customization of the tests, etc. Chapter 8 treats the

results of the evaluation. First it describes the condition the tests were done, the type of testers being

part of the evaluation, the results and the analysis. Finally, the last chapter is about the extra

applications that were done during the project. Like the gesture factory application which was used to

develop and to pre-test the commands before validation. The Bing map application which was planned

to be a test in real condition for the evaluation. And to finish, the last chapter gives insight over the

development, what can be improve, changed or removed if the application had to be reused in another

project and a conclusion.


Chapter 2

Technology

2.1 Kinect Sensor for Windows............................................................20

2.2 The Depth Sensor............................................................................21

2.3 The Near mode................................................................................22

2.4 Kinect SDK.....................................................................................22

2.5 Choice of Library............................................................................23

2.6 Candescent NUI..............................................................................23

This second chapter focuses on the technologies used for this project. It provides

information on what the kinect device is and how it works. The new features

brought by Microsoft Kinect for Windows. It provides a presentation of the

library used to get the hand model.


2. Technology

2.1 Kinect Sensor for Windows

The kinect sensor is a motion sensing device. Its name is a combination of kinetic and connect. It was

originally designed as a natural user interface (NUI) for the Microsoft Xbox 360 video game console

to create a new control-free experience for the user where there's no more need for an input controller.

The user is the controller. It enables the user to interact and control software on the Xbox 360 with

gestures recognition and voice recognition. What really differentiates Kinect from other devices is its

ability to capture depth. The device is composed of multiple sensors [FIGURE 1]. In the middle it has

a RGB camera allowing a resolution up to 1280x960 at 12 images per second. The usual used

resolution is 640x480 pixels at 30 images per second maximum for colored video stream as the depth

camera has a maximum resolution of 640x480 at 30 frames per second. On the far left of the device, It

has the IR light (projector). It projects multiple dots [FIGURE 2] which allows the final camera on the

right side, the CMOS depth camera, to compute a 3D environment. For the audio inputs, the device

has 2 microphones on each side for voice recognition. The device is mounted with a motorized tilt to

adjust the vertical angle. Kinect can detect up to 2 users at the same time and compute their skeletons

in 3D with 20 joints representing body junctions like the feet, knees, hips, shoulders, elbows, wrists,

head, etc.

Figure 1: Kinect Specs

Figure 2: IR light dots

In February 2012, Microsoft launched the Kinect for Windows. This ''new'' device is designed exactly

identical to the one for Xbox 360 to a few exceptions. First it is, like its name tells it, compatible with


Windows operating systems and not compatible with Xbox 360 video game systems. Second, it has a

new mode called ''near mode'' allowing the user to be closer of the device and still be detected. Indeed,

the kinect sensor has a minimum and a maximum distance to detect properly objects [FIGURE 5]. Out

of this range the recognition quality decreases drastically. For the original Kinect, the theoretical

distance range for a good recognition is from 80cm to 4m. On the kinect for Windows, the range is

from 40cm to 3m, a gain of 40cm which is enough to use it on a desk instead of a living room. It is

theoretical because in practice the minimum distance is further. Kinect for windows is designed for

developers with its Kinect sdk. It is an alternative to the "Kinect hacks" libraries like OpenNI.

2.2 The Depth Sensor

The depth sensors, also called PrimeSensor were developed by PrimeSense, an Israeli company. Their

main goal is depth measurements [FIGURE 3]. The prevailing

technique usually used is the "time to flight" method. It consists

into projecting a beam of light and tracks the time it takes for the

beam to leave the sensor and return. PrimeSense created a new and

cheaper way to do it. It compares the size and spacing between

infrared dots to evaluate depth. The PrimeSensor operates a system

of projected near infrared (IR) light which is read or received from

the scene using a standard CMOS image sensor to produce the

640x480 depth image [Citation 1]. The near-IR Light is used to code the scene volume, they call the

process "Light Coding". The IR light projects a speckle pattern of dots in the scene [FIGURE 4]. A

SoC chip (System on Chip) connected to the CMOS sensor provides further treatments and complex

parallel algorithms to decipher the volume scene coding and create a depth image. The projected dots

are static. Depending on the scene disposition, the dots are displaced differently from an empty scene.

That's make the first image. People might think the second image comes from one of the other camera.

It is wrong. Only one camera, the CMOS depth camera is used for depth estimation. So it still needs a

second image to do stereo triangulation to compare with the first image. PrimeSensor has an

embedded image in the memory of a virtual plane pattern as reference. So with two images, the stereo

triangulation process compares the differences between the original pattern and the one observed to

determine the depth estimation.

Figure 4: light coding 2D depth images

Figure 3: PrimeSensor's objective


2.3 The Near mode

The near mode is the actual real improvement over the original Kinect. It comes from demands of the

developers to Microsoft. Pc-based applications often require Kinect to focus on closer range than

Xbox applications. Microsoft provides the Kinect for Windows with the ability to see objects from

400mm instead of 800mm previously. The DepthRange has to be set in the application, the near mode

is enabled or not.

As for the kinect SDK 1.00, the near mode can only detect 2 objects. The skeleton tracking isn't

available in near mode. Nevertheless, the skeleton tracking is supposed to be available later for the 1.5

version of the SDK with the possibility to recognize the upper side of the body (10 joints) to use it at a

desk.

Two additional features for the depth range are also available. The MinDepth and MaxDepth describe

the kinect depth range boundaries.

Figure 5: Near mode vs Default mode

2.4 Kinect SDK

From Microsoft Research labs, the Kinect development kit for Windows 7 allows developers to create

their own interfaces and applications for windows. It was designed to be used with C++ and C#. The

paper only focuses on C# with Microsoft Visual Studio. Download and install the latest SDK from

www.microsoft.com is all what's needed.

"Kinect SDK offers raw sensor stream to access to low-level streams from the depth sensor, color

camera sensor, and four-element microphone array. Skeletal tracking: The capability to track the

skeleton image of one or two people moving within the Kinect field of view for gesture-driven

applications. Advanced audio capabilities: Audio processing capabilities include sophisticated acoustic

noise suppression and echo cancellation, beam formation to identify the current sound source, and

integration with the Windows speech recognition API"1.

To create an application with Visual Studio using Kinect, just add a reference to Microsoft.Kinect in

.Net and using Microsoft.Kinect;.

For samples and examples of what can be done with the sdk, there is the "Kinect SDK sample

Browser".

1 http://en.wikipedia.org/wiki/Kinect


2.5 Choice of the library

The goal of the project is to design and develop hand gestures, not actually implements a hands and

fingers detection and representation model. So, one of the first task was to find a library or an open

project providing information on hands and fingers tracking. As the Microsoft Kinect SDK had just

been released for a month or so at the beginning of the project the choice between libraries was really

small. Lots of existing projects provides hand tracking. Unfortunately, most of them use openNI or

NITE on Linux instead of the kinect SDK and secondly they use the skeleton tracking engine to follow

the hand's gestures. That means one point or joint in space and no finger tracking. This makes it

impossible to use for hand postures using fingers. Open sources like "Kinect Paint", "kinect toolbox"

on codePlex.com were considered but were rejected as the only use the skeleton tracking. The "Kinect

SDK Dynamic Time Wrapping (DTW) Gesture Recognition" project2 offer the possibility to record

gestures in 2D and recognize them but only works with the skeleton joints. Then there was the

"Tracking the articulated motion of two strongly interacting hands" project from the University of

Crete "It proposes a method that relies on markerless visual observations to track the full articulation

of two hands that interact with each-other in a complex, unconstrained manner"3. The project was

really interesting and promising but a source code wasn't available except a demo4 of the project and

of course it was using openNI. Finally, a project stood out, the Candescent NUI project5. It provides

full hand recognition with finger points, palm, depth, volume, etc. The open project was using the

openNI library but had just been updated to the Kinect sdk. The candescent library offers everything

this project need to be developed so it was selected as the base for the recognition.

2.6 Candescent NUI

Candescent NUI is a set of libraries created by Stefan Stegmueller. It is designed for hand and fingers

tracking using Kinect depth data. It has been developed in C# with OpenNI and Microsoft Kinect

SDK. The creator allows developers to use the libraries as long the copyright remains in the project

[see appendix].

Candescent NUI provides lots of useful information for hand and fingers tracking. It starts by

detecting close objects, two at most. These objects are then treated to get hands features. Careful, if an

object is actually not a hand, like a head for example, the algorithms will slow down the application

because they won't be able to extract the features and might probably crash. If the objects are hands,

2 http://kinectdtw.codeplex.com 3 http://www.ics.forth.gr/~argyros/research/twohands.htm 4 http://cvrlcode.ics.forth.gr/handtracking/ 5 http://candescentnui.codeplex.com


features will be extracted. A convex hull algorithm gives the finger tips position (X, Y, Z), direction,

etc. Other features like the volume of the hand, the palm position, the number of fingers, each finger's

base position and id.

Figure 6: Candescent NUI


Chapter 3

Using Candescent NUI

3.1 Bases to start with Candescent NUI............................................. 26

3.2 Kinect field of view...................................................................... 26

3.3 Hands detection............................................................................ 27

3.4 Fingers detection........................................................................... 27

3.5 Wrong finger detection................................................................. 27

3.6 Finger count stabilizer.................................................................. 27

3.7 Close hands................................................................................... 28

3.8 Layers and feedbacks.................................................................... 28

This chapter is about how the Candescent library works. It gives insight of how

to start a project using it, the conditions to make the recognition work properly.

It also gives tips on what needs special attention. How the data can be displayed

onscreen with feedbacks.


3. Using Candescent NUI

3.1 Bases to start with Candescent NUI

To be able to use the Cansdescent library the project must add references of the library's dll. There are

four necessary dll needed: CCT.NUI.Core, CCT.NUI.HandTracking, CCT.NUI.KinectSDK and

CCT.NUI.Visual. They need to be added in the references of the project.

To use Candescent with the Kinect SDK instead of open NI as it also can, the data source must be

configured properly.

To use the SDK : IDataSourceFactory dataSourceFactory = new SDKDataSourceFactory();

Then create and start the thread of the hand data source.

Create a hand data source var handDataSource = new HandDataSource(dataSourceFactory.CreateShapeDataSource()); handDataSource.Start();

3.2 Kinect field of view

Kinect for Windows detects objects at 400mm in near mode. With Candescent NUI hands have to be

at least at 500mm (minimum depth) from the camera to be detected correctly. The maximum distance

is 800mm, further than this the hands are too far to be detected and tracked [FIGURE 7]. The perfect

distance for the tracking is around 650mm. The user has around 600 to 850mm to move his hands

horizontally and around 400mm to 500mm vertically. The minimum and maximum depth distance can

be set in the ClusterDataSourceSettings Class. I've tried to change those values to get more volume

but it resulted in a worst situation. A smaller minimum depth distance under 500mm doesn't improve

the tracking and a larger maximum depth distance makes the detection disastrous. It becomes hard to

have two hands detected at the same time because it now takes more attention to further object like the

head and shoulders. 500mm to 800mm is the best range distance for hands tacking without having too

much trouble and having to move the chair back and getting far from the screen.

Figure 7: Candescent NUI area of detection


3.3 Hands detection

Candescent NUI can detect up to 2 hands simultaneously. There are stored in a list of "HandData"

"dataSource.CurrentValue". If two hands are detected, Hand[0] always will be the hand on the right

side and Hand[1] the one on the left side. Each hand has an id, a location, a volume, a palm, fingers, a

contour shape and a convex hull as main data.

3.4 Fingers detection

The way the fingers are detected is special. There's no notion of thumb, little finger or middle finger.

The fingers are numbered from 0 to 4 for each hand. Actually a hand can have more than 5 fingers

with this library as the fingers are summits of the convex hull algorithm. The first finger is always the

highest one and then the other fingers are ordered clockwise. That means if the hand rotates, the first

finger and the others change position [FIGURE 8], there's no lock mode on fingers. The fingers data

are in a List "FingerPoints". There is also FingerCount which return the exact number of fingers at the

moment.

Figure 8: Fingers' order: 1: Red, 2: Blue, 3: Green, 4: Yellow, 5: Pink

3.5 Wrong fingers detection

Like written before, the fingers are defined by the summits of the convex hull algorithm. Summits are

usually above the palm of the hand. But sometimes those summits are under it, at the wrist level which

makes false finger detections. To prevent that the "detectFingers" method take into account only

fingers detected above the palm of the hand. This takes away the possibility of having the thumb

beneath the palm or having the hand downward but those options are rarely used.

3.6 Finger count Stabilizer

As the finger detection isn't perfect, sometimes some fingers disappear for one frame. There are

several reasons for that like the hands are too close or too far from the camera, the fingers are too close


from each other to differentiate them, the fingers are too thin or other kind of perturbations. This

problem can cause lots of trouble. For example, if a gesture needs a precise number of fingers to stay

active, it won't be stable and it can make the experience painful. This is why it needs to minimize the

perturbations with a stabilizer of finger. It is quite simple, It need an circular buffer with past finger

counts and extract the most common value, not an average, with the method "mostCommonValue"

using a Dictionary. The technique is really effective but it comes with its burden. Indeed, the finger

detection become less reactive as a post-treatment of the most common value is done. The bigger the

array is, the more stable it becomes but the less reactive. Too small, would make the method useless.

So, it needs an array size of about 20-25 which is enough. Then, there's a trick to regain some

reactivity in certain situations. Just keep the "real" current number of fingers in another attributes and

use it carefully when it is possible, otherwise it will demolish the stabilizer work.

3.7 Close hands

Candescent NUI is able to manage two hands at the same time. But those hands can't be too close from

one another otherwise they will be considered as one giant hand [FIGURE 9]. This is why,

applications using this library need to be careful and avoid at maximum hand crossing or getting them

near each other.

Figure 9: Hands getting too close from each other

3.8 Layers and feedback

The layers are the container in which you can draw with the System.Drawing.Graphics of C#. There's

no need to use one but they offer the possibility to display useful feedback at least in the development

process. The feedbacks goes from the hand contour, finger positions, palm position or any unwanted

detected form contour like the head, shoulders or arms. The object to add on the form is VideoControl

from the CCT.NUI.VISUAL.VideoControl class. The capture resolution is 600x380. Of course the

VideoControl size can be larger but the feedbacks will always be within the 600x380 first pixels. To

upscale or translate the feedbacks a pre-transformation needs to be done.

g.TranslateTransform(translationValueX, translationValueY); g.ScaleTransform(scaleValueX, scaleValueY);


The layers are inheritances of the abstract class LayerBase which must implements the ILayer

interface. The interface allows switching quickly from a layer to another.

private ILayer layer; this.layer = new Layer1 (dataSource); this.layer = new Layer2 (dataSource);

Layers have to implements the Paint method. This method draws every element in the VideoControl

for each frame. The first painted elements are in the back and the lasts in the front in case of

overlapping. So, the order of drawing elements must be quickly consider.

The hand feedback can display lots of information. This information isn't always useful and can take

too much place on the screen. So it is good to disable them when they are not needed.

this.ShowConvexHull = false; this.ShowContour = true; this.ShowFingerDepth = false; //DrawFingerPoints(hand, g); //this.DrawCenter(hand, g);


Chapter 4

Gestures Design

4.1 Context.......................................................................................... 32

4.2 Selections & Release..................................................................... 32

4.3 Rotations........................................................................................ 33

4.4 Zooms............................................................................................ 34

This chapter is about design. It is a quick summary of all the gestures designed

during the projects. Each one is described briefly. For more specified

information, chapter 5 is the place.


4. Gestures Design

4.1 Context

This section details each designed gesture during the project. They are distributed in three tables

corresponding to the command they are designed for. A design is presented in three parts. First there is

its name which tries to describe the gesture itself as much as it can. It comes along with a graphical

sketch showing the mechanics of the movement. Second part is a text description on how is executed

the gesture. The third part is a list of advantages and disadvantages of using and developing the

gesture. The order the gestures are presented is to show the evolution in the design.

Selections & Release Thumb click Description Pros & Cons

The thumb of the left hand must do a little "click" by

desappearing and reappearing.

+ simple to use.

+ easy to detect by the camera. - slow

- identify the thumb

- fatiguing.

Finger pinch

Description Pros & Cons

The thumb and the index must

touch each other to activate the

selection and hold the position.

+ natural, easy and not tiring

- the hand must a little on the side to clearly detect the fingers.

Hand grab


Close the hand like it grabs the object and then release it by

opening the hand again

+ easy to use, very easy to detect. + accurate, not tiring on the short term.

- can be tiring on the long term.

Push down repeated


Horizontal hand posture.

Sudden vertical downward

movement directly followed by the counter sudden upward

movement.

+ light and natural movement

- hard to detect and implement (swipes) - hard to single out from other moves

- horizontal posture is hard to detect

Push down & hold


Horizontal hand posture.

Sudden vertical downward

movement and hold position to stay active. Upward movement

to release the command.

+ light and natural movement + less movement than previously

- horizontal posture hard to detect

- hard to detect and implement (swipes) - hard to single out from other moves

Push down & hold 2


Vertical downward movement to an activation area and hold

position to stay active. Upward

movement out of the area to

release the command.

+ no posture needed, position prevails. + easy to use, accurate.

+ easier to implements

- less natural

- less flexible (fixed area for detection) Table 1: Selection gestures design


Zooms Vertical slide


Sliding vertically with two fingers. Upward to zoom in and

downward to zoom out.

+ easy to use and implement

- not very accurate - limited movement → repositioning

Finger spread


Take the distance between index

and thumb. If the distance

increases, it zooms in. If it decreases, it zooms out.

+ natural and intuitive

- very hard to use - not enough space for accuracy

- limited movement, hard repositioning

- release of command difficult

Continuous

vertical slide


The zoom is activated when a

minimum vertical distance from

the reference point is reach. The further , the faster. The zoom is

continuous until the finger is back

in the neutral zone.

+ easy to use

+ continuous gesture → no repositioning

+ variable speed + neutral zone

- learn to use the variable speed

Depth slide

alternative


Two activation areas, one in front to zoom-in and one in the back to

zoom out. Zooms are continuous,

no variable speed.


+ no initial point

+ continuous gesture → no repositioning - not flexible

- not natural (pushing & pulling)

Depth slide

continuous


Move forward to zoom in and

backward to zoom out. the distance from the initial position

sets the incremental speed.

+ easy to use + continuous gesture → no repositioning

+ variable speed


- hard to be accurate

Depth slide

progressive


Move forward to zoom in and

backward to zoom out. The

distance from the initial position

sets directly the zoom value.

+ feels natural

+ easy to use

- easy to be accurate - limited movement → repositioning

- less flexible due steps

Circles detection

with vertical

posture


Hand in vertical position.

Circlular movements are detected.

Clockwise to zoom in and counter-clockwise to zoom out.

+ natural gesture, easy posture

+ no initial point


- hard to detect circles (true-negative)

- vertical posture hard to detect

Rope technique

Step by step


Measure the angle between the reference point and the finger. A

zoom is done if the difference

between the actual angle and the last one is big enough.

+ Easy to use and feels natural

+ No need of circle detection

+ continuous gesture → no repositioning + accurate and immadiate results

- positioning around initial point Table 2: Zoom gestures design


Rotations Vertical slide


Sliding vertically with three fingers. Upward to rotate

clockwise and downward to rotate

counter-clockwise.


- not very accurate

- limited movement → repositioning

Horizontal slide


The zoom is activated when a

minimum vertical distance from

the reference point is reach. The

further , the faster. The zoom is continuous until the finger is back

in the neutral zone.

+ easy to use


+ variable speed

+ neutral zone


Circles detection


Circlular movements are detected.

Clockwise for a positive rotation and counter-clockwise for a

negative one.

+ natural gesture, no posture + no initial point

+continuous gesture → no repositioning

- hard to detect circles (true-negative) - conflicts with other movements

Circles detection

with palm posture


Open hand facing the camera.

Circlular movements are detected. Clockwise for a positive rotation

and counter-clockwise for a

negative one.

+ natural gesture, easy posture

+ no initial point + no more conflicts due to the posture


- hard to detect circles (true-negative)

Rope technique

Absolute angle


Measure the angle between the

reference point and the finger. The object rotates directly to the

absolute angle value.

+ No need of circle detection


+ objects follow the finger + accurate and immadiate results

- positioning around initial point

- very hard to use without experience

Rope technique

Step by step


Measure the angle between the

reference point and the finger. A rotation is done if the difference

between the actual angle and the

last one is big enough.

+ Easy to use and feels natural + No need of circle detection


+ accurate and immadiate results

- positioning around initial point Table 3: Rotation gestures design


Chapter 5

Gesture Recognition Development

5.1 Iconic gesture 1............................................................................. 36

5.2 Iconic gesture 2............................................................................. 39

5.3 Iconic gesture 3............................................................................. 41

5.4 Iconic gesture 4............................................................................. 42

5.5 Technologic gesture 1................................................................... 44




The gestures are fully described with their pros, cons and insights. The gestures

are divided into groups. Each group describes some commands, one of each type

maximum. The order of presentation tries to follow the evolution of a gesture

like in chapter 4 when it is possible.


5. Gesture recognition Development

In order to create efficient natural user interfaces the idea is to design multiple gesture recognitions for

the three tested commands; rotation, zoom and selection. Once a gesture is operational, it can be tested

to reveal its pros and cons. Then it is adapted, improved or dropped. The gestures of each command

are put together in groups of three, one for each command, according to their similarities when

possible. Some groups are improvements of old ones. Not all groups have unique types of gesture

recognition, some are reused. For the final evaluation with real testers, a selection of two groups of the

most suited gestures for each task is chosen. The reason is that the evaluation takes quite some times

to go through and the testers can't be hold forever. There are two types of gestures. The Iconic ones,

supposed to be closer to natural human gestures, come more easily to a user's mind to accomplish a

task with a certain command. The second group is the technologic gestures. Those are closer to the

machine side. They are less natural but easier to process. Most of the time, gestures can be divided in

three parts. First is the activation. This is often a posture of the hand (vertical, horizontal, etc) or a

definite number of fingers. It tells the machine which command has to be process. The second part is

the execution, which is some kind of movement. Some movements have to keep the activation posture

to stay active all along, some others don't have to. It usually depends on the nature of the gesture if it

can be confounded with other gestures or not. The third part is the release, usually used to avoid the

activation of another gesture during execution. It is also commonly a posture or definite number of

fingers.

5.1 Iconic gestures 1

5.1.1 Description

The idea is to find natural gestures that would come intuitively to mind and see if those gestures can

be efficient in practice or not. The tactile screen gestures were used as inspiration. For the zoom, two

fingers, the thumb and the index finger, approach or distance themselves. The bigger the distance is,

the bigger the zoom is and vice versa. For the rotation: a circular gesture. It detects if a circle is created

by the user's hand movement. If it is a fixed angle rotation on the object is done. The Selection is done

by small vertical swipe, a quick movement of the hand, in the horizontal position, downward, like

pushing a button. The gesture has to be done once to select and a second time to unselect.

Figure 10: Zoom, Rotation & Selection of Iconic 1


5.1.2 Technical and Operational Advantages & disadvantages

Zoom: The implementation is easy. It just computes the distance between the thumb and the index

finger. The difficulty is to identify the actual thumb. The "findThumb" method takes of care

of it. Or it just can take account of the two only fingers detected. Once the two fingers

positions are known, the distance between them is the reference. If it growths bigger, it

zooms in and if it reduces, it zooms out. The problem is that it is a finite movement. When

the fingers touch each other or are spread to their maximum, it's over. To zoom in or out

more, the fingers have to be reinitialized. That's where it gets tricky. The user just can't

detach its fingers or it will zoom in and will lose all it has done yet. On tactile screen it is

easy to do, it just needs to stop touching the screen and restart. On camera, the user has to

release the gesture, either by getting off camera or by having a special posture that can't be

mistaken for a zoom in or zoom out gesture. In practice, the fact of releasing the gesture is

really painful and frustrating. Otherwise, it is hard to be precise because the sensitivity is

high due to the short maximum distance between fingers. The sensitivity can be reduced by

slowing down the speed of zoom but then it will take more tries to reach the desired size.

Rotation: To do a rotation the left hand has to "draw" a full circle. A full circle detected rotates an

object of a fixed value of degrees each time. Points of the hand's actual and past positions

are required to recognize a path. The points are stored in a circular buffer. To detect a

circle, each point is taken into account. It computes the centroid, the center of mass which

is the center of the circle. Then it computes the average distance between the centroid and

each point, to create the average radius of a perfect circle. Finally, it checks that more than

80% of the points are near the radius (+/- 10%) [FIGURE 11]. If it does, a rotation of a

fixed value of degrees is performed. To know if the rotation has to be done clockwise or

counter-clockwise, the computation of the centroid comes in handy. To find the centroid,

the "findCentroid" method computes a polygon area of the points. If that area is positive

then the rotation has to be clockwise and vice versa. The Problems with this method are

multiple. First it takes a lot of points to detect a full circle second users aren't always

making circles, so the path of the points unpredictable. The algorithm has to find a full

circle in a cloud of points. But the more points there are, the harder it gets to detect a full

circle because there are easily more than 20% of them out of the perfect circle range

[FIGURE 11]. So the user has to do at least two full circles to align all points in the

theoretical circle. It also takes lots of full detected circle to do a big rotation as every

detection rotates one little step only. Finally, it's takes longer to detect full circles if the

path is longer. In practice, it is pretty hard to draw a correct circle in the air. Most of the

time, it looks more like ellipses which make it difficult to detect. It makes this technique


hard to use. Another issue has to be considered. The first full circle is hard to get but then

each new points located in the circle makes a full circle along as 80% of the points are in

the circle. So, when the gesture is over the object will continue to rotate until at least 20%

of the points are out of the circle which making it hard to be accurate, hence the need of a

smaller buffer to reduce this effect.

Figure 11: Full circle detections

Selection: Using swipe gesture is more complicated to implements than it would appear. First it needs

a circular buffer of past positions. Then it needs to find stability in the movement to be sure

that is not interfering with another gesture. The stability is relative as the hand always

shakes a little so it needs to be flexible. If the signal is stable for awhile, the swipes can be

identified. To detect a vertical swipe it checks in the points' history an important vertical

difference between the points. The results are mixed. It works but not always when it

should. There are several problems. First the stability of movement is hard to manage, if it

is too flexible the system is always stable and will detect too much swipes, in opposition is

it's not flexible it won't try to detect any swipes. Then swipes themselves, there are hard to

detect. It can try to detect the slope of the points to deduce the vertical speeds and

accelerations. It's hard to make it right. In practice the selection is unsatisfying due to is

randomness.

5.1.3 Quick Summary

Zoom

Activation:

2 fingers

Execution:

Spread the fingers

Release:

anything but 2 fingers

Rotation

Activation:

none

Execution:

Make circles

Release:

Stop making circles

Selection


Select:

Swipe down the

hand

Unselect:

Repeat movement

Swipe down the hand

Notes:

The selection stays on until a second swipe

down is detected.

Table 4: Iconic gestures 1 summary


5.2.1 Description

As the zoom wasn't a success in iconic gestures 1, this version tries to improve it. Gestures with

limited movements, like spreading the fingers or stretching out the arm, always bring trouble when it

comes to getting back to the initial position. Therefore this time, it uses a continuous gesture like the

previous rotation with a circular movement. It is still based on the circle detection but an improved

version. As the movements are similar, the postures of the hand differentiate the commands. A vertical

hand on the side (slim) for the zoom, a open hand (large) with palm facing the camera for the rotation

and a horizontal hand (palm facing the ground) for the selection. The Selection is an alternate version

of the previous one. Instead of having to repeat the gesture twice: once for selecting and once again for

unselecting. This iteration breaks down the gesture in half. A quick small swipe down, still with the

hand in a horizontal posture, and stay down activates the selection. The selection stays active as long

as the hand holds in its down position. To release the selection, the hand has to quit its down position

by moving up.



Posture recognition: Candescent NUI provides useful data on the volume of the hand. Indeed the

width and the height are easy to get. Then a few easy tests based on thresholds establish the

hand posture. The all point of this technique is to pick the right thresholds. That's where it gets

delicate. Overall the method is pretty accurate, but there are a few flaws. First the vertical

posture based on a great height and a thin width, if the thumb gets too much on side, the width

increases too. If it crosses the width threshold, it is wrongly considered as an open flat hand.

Also depending on the depth position of the hand, the values are way smaller when the hand is

far and bigger when it is close from the camera. For the horizontal posture, the problem stands

in the wrist. It is easy to check if the height is small because it is way smaller than the other


postures. The problem is sometimes the hand is so horizontal it is not detected anymore and

the wrist is taken for the hand [FIGURE 13] and return false results.

Figure 13: horizontal hand not detected

Circle Detection: As the first try to detect circle wasn't a success, the second attempt is more

permissive. As the user is always moving his hand in the same area an average centroid is

calculated. Like before, it takes all the past position in the circular buffer to compute the actual

centroid and put it in an array of a few past centroids which will determine the current average

centroid. This method stabilizes the centroid point and avoids radical changes from bad point's

detections. This also helps reducing the number of points needed for the circle detection as the

centroid is less volatile and thereby takes less time to compute. Full circle detections were

demanding and exhausting. To make the detection faster, only a part of a circle is needed. By

reducing the number of points, the path of the past points is closer to the actual activity of the

hand. When a circular movement starts it is almost instantly recognized. The problem is

stopping the rotation after the actual movement. It is still here but it has been reduced by the

fact the number of points is smaller. In practice the detection of circles is easy and fast, but it

sometimes, rarely but still, erroneously detects circles where it shouldn't. To reduce this issue,

increase the percentage of points needed in the circle, but it will make the detection stiffer.

Figure 14: Circle movement detected

Selection: A second version of the selection with a movement downward. This time, the user has to

do a small swipe down, hold it down and finally swipe up to release. This brings the same

problems as before and adds more. It needs to check the stability of the hand for the beginning

of the gesture and once again at the end to ensure the movement is finished, so can't be

mistaken for another movement and as a base for the swipe up to release the selection. As the

hand movement is hard to be stable, the whole technique is just unstable in the end.


5.2.2 Quick Summary

Zoom

Activation:

Vertical hand

on the side

Execution:

Make circles

Release:

Stop making circles

Rotation

Activation:

Full hand:

palm facing

camera

Execution:

Make circles

Release:

Stop making circles

Selection

Select:

Swipe down

and hold down

position

Unselect:

Move the hand back

up

Notes:

The selection stays on until the down

position is quit.



5.3.1 Description

The two last selection gestures using swipes weren't successful. This version keeps the idea of pushing

down the hand to do a selection and pushing up to release it. But this time, instead of using swipe

movements, it uses a fixed limit. To do the selection the palm of the hand just need to go under the

limit. It stays selected as long as it stays in the zone and doesn't cross the limit again.

Figure 15: Selection for Iconic gestures 3


Selection: The implementation is easier compared to the other version. It requires a threshold to

delimit the area of activation. The threshold is pushed a little bit up when crossed to avoid

instability if a user stops just after the limits the risk crossing back involuntary due to the

shakiness of the hand. Once the threshold is crossed again in the other way, it gets back to its

normal height. The threshold checks the hand current position (the green dot) as indicator. In

practice it works fine, it less demanding than the previous version and it is more robust and


reliable. On the downside the system has to be careful that the selection can't be activated

while another command is on if the hand position gets in the activation area. To avoid that,

there are some locks along with gesture recognitions.

5.3.2 Quick Summary

Selection

Select:

move down the

hand in the

blue zone

Unselect:

Move up the hand

from the blue zone

Notes:

The selection stays on until the down

position has quit the zone.


5.4.1 Description

The idea of making circles as it is a continuous movement is kept again. However, this iteration rebuilt

from the ground up the method of detection. This time no centroid is calculated. A reference point is

fixed at the activation of the commands zoom or rotation. Once the reference point is fixed, the user

just needs to move his hand around it. Like a rope attached to a post it will turn around automatically,

hence the name "rope technique". According to the hand's position and the reference point, it

computes the actual angle. The user can gain precision as he moves away from the reference point. On

the other side, with a short rope (distance between the reference point and the finger) it gets harder to

be precise. So a minimum radius is required to work. For the zoom, activation is done by showing two

fingers only to the camera. Then, the user needs to turn around the reference point clockwise to zoom

in and counter-clockwise to zoom out. The Rotation is done naturally. One finger to activate it and

then, like the zoom, turn around the reference point. It rotates the object of the actual absolute angle.

That means the object follows directly the hand position. This gives complete and direct control of the

angle of the objects. The selection is done by closing the hand and hold. It represents like grabbing

something and holding it to then releasing it by dropping it.




The rope technique: First it needs a reference position. A fixed point in the middle of the detection

area has been consider and tested. Unfortunately, it causes trouble as it demands more efforts

and larger movements to execute the command. Detection wise the problems come when the

user approaches the limits of the detection area. So a mobile reference point is used. This

position is created once every time the command is activated and removed at the end when the

command is released. To know exactly the angle between the reference and the actual position

a simple trigonometry formula comes in handy. First get the distance between the two points

to determine the circle radius. Then extract the position of the point at zero degree, which is

the reference point plus the radius on the x-axis. Now, compute the distance between the new

point and the actual point (a). Finally compute the angle with the law of cosines [FIGURE 17].

Transform the angle from radian to degrees and replace negative angles by their positive

equivalent to have angles from 0 to 359. This technique is simple, quick and accurate.

Figure 17: rope technique angle example

Zoom: The zoom uses the rope technique. It gets the angle but doesn't use it directly. It actually

compares it with the past used angle. If the difference between them is larger than the pre-

determined margin, the zoom is carried on. If the difference is positive it zooms in and vice

versa. The margin, also called "zoomStep", provides smoothness and accuracy. Without it the

zoom would have been too sensitive and very difficult to control. The past angle must only be

stored when the angle has crossed the margin limits. Large zoomSteps will bring accuracy and

avoid shakiness problems but it will be slower to operate. In practice, once activated, the zoom

is easy to use and feels natural with good precision. But learning how to use it can be hard due

to the hand placement which is pretty demanding.

Rotation: it takes simply the angle value from the rope technique and transmits it directly to the object

to rotate it. Actually it works very well but it's too precise. Any small movement is perceive

and changes the angle, making it hard to reach with exactitude the target angle. To prevent that

issue, the idea is to reduce the accuracy of the method. By adding an "angleStep", which is

nothing more than a denominator, the precision is delimited. For example with an angleStep of

10, the output angle can only be a multiple of 10 and give 36 degree of freedom to the object.

So it gives more room to move between steps and avoid the shakiness syndrome. On the other

side, it loses all the angles in between. In practice, the rotation requires a little adaptation


because result of the movement is immediate and the user has to learn to use it properly. It

takes quite some time to acquire the necessary skills for some users. Once learned, it is the

quickest way to rotate an object. Just point the angle wanted and it instantly follows.

Selection: The idea of grabbing and holding an object to move it feels natural. On the technical side, it

is quite simple to implements, if the hand is present hand has no fingers (fist), it assumes the

hand is closed and it activates the selection until fingers appears. It also is easy to detect which

makes it reliable and robust. In practice, it performs very well. The only downside is the

physical efforts it demands on the long term. It takes not inconsiderable efforts to make a fist.

However, if the user has often to repeat the gesture over and over again, his hand will be

exhausted. On the short term, it is super efficient.

5.4.3 Quick Summary

Zoom

Activation:

2 fingers

Execution:

Turn around the

reference point

Release:

Display more than 2 fingers

Rotation

Activation:

1 finger

Execution:

Turn around the

reference point

Release:

Display more than 2 fingers

Selection

Select:

Close the hand

Unselect:

Open the hand

Notes:

The selection stays on until the hand opens.


5.5 Technologic gestures 1

5.5.1 Description

The commands are controlled by sliding the hand vertically. It takes the vertical position of the higher

finger and compares it with its last measurement. The vertical difference increase or decrease the value


for the designated task. The number of fingers defines the command. The selection is activated by a

"click" of the thumb. By clicking, it means that the thumb must be visible, then disappear for a few

milliseconds and then reappear [FIGURE 18].

Figure 18: Thumb click operations


Sliding: Technically it is simple to implement. All it needs is the current actual vertical value and

compare it to its past value. A small margin or threshold reduces the sensitivity during

operations. The problem with this technique is when the hand reaches the top of the

camera's view. It can't go any further so it has to reposition itself down and restart the

sliding movement again. The problem is that unlike on tactile screens, where the user can

stop the recognition by releasing the touch screen, there's no stopping the recognition

without being off the camera's view. So, the user has to release the activation posture,

reposition correctly, reactivate the command with the activation posture and repeat all

those actions until it reaches its goal. This is pretty demanding and exhausting.

Thumb click: First it detects the thumb on the left hand with findThumb. Depending if the thumb is

present or not the state changes. There are four states to operate correctly [FIGURE 19].

The initial state avoids a false click at start. A timer starts every time the thumb disappears;

if the thumb reappears within 800 milliseconds the Selection is activated. The delay is long

enough to allow smoother thumb movement and to be less exhausting over time. There also

is a minimum time of 250 milliseconds for thumb to reappear in case the thumb is lost

unexpectedly for a few milliseconds. Even though it is simple to detect with a very good

accuracy and it is easy to do for the users, the thumb click is not really appreciate by users.

They find the gesture fatiguing and lame surprisingly.

Figure 19: Thumb click state diagram


5.5.3 Quick Summary

Zoom

Activation:

2 fingers

Execution:

vertical slide

Release:


Rotation

Activation:

3 fingers

Execution:

vertical slide

Release:


Selection

Select:

Thumb click

Unselect:

Thumb click

Notes:

The selection stays on until a second thumb

click is detected.

Table 7: Technologic gestures 1 summary


5.6.1 Description

This is a zoom improvement of technologic 1. The idea is to use depth for the zoom. There's no need

of activation, the hand just has to reach a determinate area to do a zoom in or a zoom out. The "zoom

in" is located in the front so the user has to stretch his arm. The "zoom out" is located on the opposite

side in the back and the user has to retract his arm to reach it. In the areas, the zooms increase or

decrease continually at the same speed. To stop zooming in or out, the user just has to leave the area.

Figure 20: Zoom technologic 2


This method is easy to implement. The movement is continuous, so no need to reinitialize the

command to finish it. Just allocate the boundaries to reach to execute the zooms. The closer they are,

the less it fatigues the arm but the more it is activated accidentally. Then just compare the actual Z

position of the hand with the boundaries. Like written before, this technique is really exhausting for

the arm.


5.6.3 Quick Summary

Zoom

Activation:

none

Execution:

Reach front or back

limits by sliding on Z-

axis

Release:

none



5.7.1 Description

In order to still improve the zoom previous techniques, a progressive zoom comes along based on

depth. This time instead of having area to reach to activate the zoom in or out, the zoom actually

increases or decreases progressively accordingly to the initial position of the hand at the activation

[FIGURE 21]. This means the more the user stretches out is arm, the more it will zoom in and

oppositely it will zoom out.

Figure 21: Progressive zoom from technologic 3


Technically it is more complex to implement. First it needs an activation posture to set off the initial

position. It just can't be hardcoded otherwise the zoom would increase or decrease without control

directly at activation. It also just can't take the difference between the initial position and the actual

position and zoom accordingly because the system would be too sensitive. So it needs steps to

attenuate the hand imprecision and shakes [FIGURE 22]. The step size and number is configurable

along with the zoom speed. Bigger steps give more stability in the movement, but the arm has to

stretch out more and might end up restarting the full gesture over multiple times to reach goal. Another

problem is if the user initiates the command near the limits of detection. It won't work as well as if it is

made in the middle of safe detection area on the z-axis as it should. The steps are an array of Booleans;

each time a step is crossed, it becomes true. If the system takes absolute values [FIGURE 23], it just

needs to return the calculate value according to the zoom speed. If the system takes continuous values

[FIGURE 23], which means if all conditions are met, it send over and over again the value to increase

little by little the object, then it needs a second array of the past situation to compare it with the actual

situation and only send once the value when something differs. There is no real difference in practice

between the two techniques. Like its predecessors, this technique is physically exhausting and it

demands a lot of concentration. Otherwise it feels natural and it is pretty accurate.


Figure 22: Progressive zoom with steps

Figure 23: Progressive zoom examples

5.7.3 Quick Summary

Zoom

Activation:

1 fingers

Execution:

Slide on Z-axis

Release:




5.8.1 Description

Keeping the idea of sliding, this iteration brings vertical and horizontal sliding in one single gesture

from a reference point. It looks like a cross hence is name the "cross technique". The zoom and

rotation are done by continuous movements. It is activate with one finger and it creates a reference

position at its location. To zoom in the finger has to slide up and slide down to zoom out. The rotation

works the same way but horizontally. Slide left to rotate clockwise and slide right for counter-

clockwise. A neutral zone is delimited by a red square around the reference point. The finger has to

cross these limits to activate one of the commands. The commands speed is variable. The further the

finger is from the limit, the faster the command will be executed. To release the command, just open

the hand wide-open. For the selection: a pinching gesture. It's supposed to feel natural like picking up

an object, move it and then drop it. The hand posture has to be at least a little bit on the side for the

camera to detect correctly the pinching. If the hand is facing the camera directly, the fingers won't be

correctly detected and will create instability.

Figure 24: Zoom, Rotation & Selection of Technologic 4



Zoom and rotation: The neutral zone gives the system some stability. It needs to be as small as

possible to reduce the distance to execute a command, but it also needs to be big enough to

avoid shakiness perturbations and mixing up the commands. Only one command can be

executed at the same time. To change the command the finger has to get back in the neutral

zone. Between vertical and horizontal commands, the chosen command is always the further

distance (vertically or horizontally) from the reference point. In practice, the cross technique is

easy to learn but hard to master due to its variable speed system. The variable speed can lost

accuracy when it is not controlled properly. With skills, the variable speed gains time.

Selection: Pinching the fingers is a very easy and natural gesture to do. Implement it is different. First

it has to detect the thumb with "findThumb". Pinching requires the two fingers to touch each

other. Unfortunately with Candescent NUI, if two fingers touch each other they are consider as

one finger. So to implement the pinching, it requires adding a threshold as small as possible.

When the distance between the index finger and the thumb has crossed the threshold, the

pinching is activated. It is deactivated when the distance is bigger than the threshold which

allows the fingers to finish the full gesture and touch each other without altering the command.

In practice, the pinching requires some skills. The movement has to be done gently, not too

fast, to let the application measure the distance. The hand has to be a little on the side so the

camera detect correctly the fingers. But once the learned, the gesture is effective and not too

exhausting on the long term.

5.8.3 Quick Summary

Zoom

Activation:

1 fingers

Execution:

vertical slide

Release:

More than 3 fingers

Rotation

Activation:

1 fingers

Execution:

horizontal slide

Release:

More than 3 fingers

Selection

Select:

Pinch fingers

Unselect:

release fingers

Notes:



Chapter 6

The selected gestures for the evaluation

6.1 The final choice............................................................................. 52

This chapter gives the reason why some gestures were picked for the final

evaluation and others were dropped.


6. The selected gestures for the evaluation

6.1 The final choice

For the evaluation, all designed and implemented gestures couldn't be tested with users. It would have

taken too much time. The evaluation needs to be short to avoid weariness and to have users willing to

spare some time to do it. The evaluation tests two sets of gestures for the three commands. The first set

is a selection of iconic gestures. They are supposed to be intuitive and natural. The second set tries to

use simple gestures, close to each other to be as effective as it could.

6.2 The Iconic choice

The selection was quickly chosen. The gesture of hand grabbing was obvious. It's simple and it

worked pretty well during development compared to the other selections. For the zoom and rotation,

continuous gestures have been favored over limited gestures. The gestures using depth for the zoom

have been put aside due to the difficulty to be accurate and the fatigue it causes. The choice was to use

the step by step rope technique. It's simple to understand and to operate. For the rotation, the rope

technique has also been selected but in a first time using the absolute angle. In previous tests, this

technique showed that it is hard for users to understand how it works and use it correctly without a lot

of practice. So, the technique was dropped and replaced by the step by step rope technique which is

handier. The two gestures activations are differentiated by the number of fingers; one finger for

rotation and two fingers for the zoom.

Figure 25: Chosen gestures for Iconic

6.3 The technologic choice

The technologic gestures try to offer very different command from the iconic gestures. Sliding

gestures for zoom and rotation showed good results during development. Continuous gestures was also

opted to reduce the users' movements and to avoid constraining repositioning. Using a vertical slide

with a horizontal slide reduces the number of posture (1 finger or 2 fingers, etc) needed to activate the

commands and so it reduces the complexity for the user. For the selection, a simple move in a

predefined area will check if the users can easily put their hand in the right position to execute a

command without any trouble.

Figure 26: Chosen gestures for Technologic


Chapter 7

The Test application

7.1 The description of the application................................................. 54

7.2 The levels...................................................................................... 55

7.3 Editable levels............................................................................... 58

7.4 The general interface.................................................................... 59

7.5 The layers...................................................................................... 60

7.6 The objects.................................................................................... 61

7.7 The targets..................................................................................... 61

7.8 The feedbacks............................................................................... 62

7.9 Animations.................................................................................... 65

7.10 The measures................................................................................ 66

7.11 The logs........................................................................................ 66

7.12 Remaining bugs and issues........................................................... 67

7.13 The class diagram......................................................................... 68

The application is deeply described in this chapter. It brings all information on

what the application does, what is possible to do with it and how it works. All

features are treated from the interface to the log files.

Final Project Thesis 54 02.10.2012

7. The test application

7.1 The description of the application

The application focuses on the commands. There are three distinct commands to operate on objects;

select, rotate and resize. A fourth command can also be consider; move. Moving an object requires the

combination of the select command to activate the object in "move mode" and the pointer, controlled

with the right hand, to move it anywhere like a drag and drop. The application tests these four

commands independently in five levels (activities) of exercises. The last level is a combination of all

commands. There also is a training level to help to user getting hands-on with the commands before

the evaluation. The users go from one level to the next in the same order [FIGURE 28]. They have to

do the circuit twice; once for each group of gestures. For each level, a log file is created at the end with

the history of the commands, the time spent on the tasks and the distance between the object and its

target after an operation. Each task is evaluated alone, to observe the learning curve. It also evaluates

the whole level with general statistics of the commands and overall time spent. The levels uses xml

files to create lists of objects for the tasks. These xml files also allow flexibility by adding and

removing objects. Objects are editable; types, proprieties, options and targets proprieties. Before the

evaluation starts, the application displays a window for the settings [FIGURE 27]. Mouse and

keyboard are used for this window. It demands the name of the tester which must be unique.

Verification is done to insure it when the button starts is pressed. Then the tester can watch three

tutorial videos explaining all the commands, mechanics and important information needed during the

evaluation. The tester can select which activities he wants to do but not the order. This is useful if the

application was to crash or something happened. Of course to restart the evaluation with the same

name, all previously done activities have to be unchecked. The user can choose between iconic and

technologic gestures with a combo list. Two extra buttons allow setting the kinect vertical angle. The

angle can be adjusted afterward during the evaluation. After clicking on the start button the evaluation

begins and only kinect and hands-free gesture are used.

Figure 27: Setup windows form


Figure 28: Progression of the level path

7.2 The Levels

The main idea for the evaluation is to test every command separated from one another. In each

dedicated levels, users have to perform tasks. These tasks are designed to test specifically the

associated command and no other. For each level, the same task is repeated several times to evaluate

the learning curve. The number of tasks must be high enough to observe if a real evolution arise, but it

shouldn't be too high to avoid weariness. Some little differences in tasks may occur like moving

objects in different directions, rotate objects to different angles, etc. Small changes avoid routines and

check if the command adapts itself well in other situations. So each task has different assignments. To

provide results corresponding to the dedicated command only, the other commands have no influence

on the objects and the tasks in general. They are accounted for the statistics but they don't jeopardize

the test. A final level regroups all commands to evaluate a full situation. This time all commands are

active and tested. For each level, a "skip" button appears during the activity if the user takes too much

time to conclude it. Then the user has the choice to finish the level or skip it and move on to the next

level. This avoids getting stuck indefinitely. All statistics are saved even if the level isn't totally done.

Before a level begins, a description of the goal of the activity is presented. It comes with a non-

interactive animated demonstration and the object color legends and a remainder of the commands.

Figure 29: Panel of the 6 activities


7.2.1 Training

Training allows the user to test the different commands on multiple objects. There is no goal to

achieve in this level, just get use to the commands and pointer. The training level has a minimum

duration of ninety seconds. Of course the user can spend more time exercising if he wants to. At ninety

seconds, a button in the right corner appears. The user just has to point at it to exit the level and go to

the next.

7.2.2 Selection

The selection level provides twelve tasks. Each task consists of getting the pointer over the object,

select it and unselect it still pointing at the object. The twelve objects appears one after the other. They

are spread equally over a circle [FIGURE 30]. Every object is at equal distance, 400 pixels, from the

next, they also appear in an order providing the same distance every time. This technique comes from

the ISO 9241-9 document, chapter Multi-directional pointing task. In the application code, it is easy to

change the number objects for the activity. By default, there 12 objects, but it can be modifying

without having the trouble of computing the object positions and order. Everything is done

automatically, as long as the number of object is even. The "createPiecesInCircle" method needs

the center point of the circle, the number of piece and the radius of the circle. It will return a list of

objects in the right order and in the right place. The order of appearance is important. It gives the

opportunity to test the movements in every direction. The object can only be selected, no rotation, no

drag and drop, no resizing are possible. For each task, the log file collects the number of selections and

the distance between the pointer and the object to evaluate the accuracy. It also collects the time to

achieve each task and the number of possible rotation and zoom activations which have no effect but

are still counted for statistics.

Figure 30: Appearance order on circle

7.2.3 Move

The move level tests the drag and drop [FIGURE 31]. It demands the users in each task, 12 in total, to

select an object, hold it with the left hand, move it in the blue area with the right hand and release it.

Like the selection level, the objects are spread around a circle for the same previous reasons and with

the same options to change the parameters. The target area is located accordingly to the object on the

opposite side of the circle. The distance to cover is always the same. The log file recorded the distance


between the object and the target zone at every release of the object. The object can only be selected

and moved in this level.

Figure 31: move level demonstration

7.2.4 Rotation

This level focuses on rotations only [FIGURE 32]. The user still has to point the object to activate it

and do the rotations. Each rotation is different and the user has the right to do them clockwise or

counter-clockwise which can take more time depending on the choice. There are eight tasks in this

level. Objects can only rotate if other commands are deactivated. The log file collects the number of

rotations for each tasks and the angle difference between the object and the target.

Figure 32: rotation level demonstration

7.2.5 Resizing

The resizing level evaluates the zooms [FIGURE 33]. It works exactly like the rotation level. It counts

all the commands but only the zooms have effect on the objects. There are eight tasks to accomplish.

The tasks go from little zooms in and out to test accuracy and large one for speed. After each release

of a zoom command, the difference between the actual size of the object and its target is recorded.

Figure 33: Resizing level demonstration

7.2.6 Final

The last level regroups all the commands. In each task requires moving, rotating and resizing one

object [FIGURE 34]. As the level is demanding more efforts and takes more time to execute, three

tasks need to be accomplish to finish the activity. The three commands can be done in any order. For

each task, all behaviors are recorded; time, selection distance to the target, rotations and zooms as well

as the global statistics for the whole activity.


Figure 34: Final level demonstration

7.3 Editable levels

To give more flexibility to the application, a system of editable tasks has been implemented. It allows

adding, removing and changing tasks before running the application. Instead of hard-coding the tasks

in the application itself, the idea of reading them from xml files made more sense. Every level has its

own xml file containing all the objects, which can become tasks depending on the level nature. Each

object or piece as it is in the xml file has several properties. It has a type, an id, a goal or not, initial

properties like its position, size, orientation, and scale. If it has a goal, it needs information about the

target (final position) which is also the locations, size and orientation. Finally, the piece has a few

options. It can be chosen if the piece is active or not, which means that a non active piece won't be

displayed until it supposed to. It is useful to go from one task to another. The other options focus on

permissions like if it can be moved, selected, rotated, resized or even highlighted. Each file can have

as many pieces as wanted. The structure of the file isn't very flexible; all the tags need to be correctly

written.

For levels designed in circle like selection and move, only one piece needs to be described. Indeed, in

those levels each piece is the same. They are all spread around the circle. So, the application takes the

properties of the first piece and duplicates as many times as it needs. It automatically changes the id of

each piece so they appear in the right order during execution. The id number doesn't follow the same

order as the creation of the pieces. The pieces are created in order, one after the other following the

circle. The ids follow a different order to move the pointer in all direction as explained before. From

0° to 180° the ids of the pieces take odd numbers (if started from 1) and even numbers from 181° to

360°. It also deactivates all the pieces except the first one.


Figure 35: xml piece example

7.4 The general interface

The Candescent library gives all necessary data source from Kinect needed to do the recognition. The

data needs to be centralized. That's the job of the HandTracker class. It updates the data when new one

is available. Then it updates its attributes like the pointer and the commands with the recognizer. All

the attributes can then be recovered with get methods from any class with a HandTracker object. This

way hand tracking can be use in any situation and can adapt to other application than just layers from

the candescent library. The recognizer is an interface. It allows choosing which class of gesture the

application wants to use, the CrossGestures for the technologic gestures or the RopeGestures for the

iconic ones. It can easily adopt new gestures class as long as it fulfills the interface contract. This

application uses the graphic layers from the candescent library.

7.4.1 The pointer

The pointer is controlled by the right hand. The HandTracker class returns only a point in two

dimensions but more is done in the pointing class. First it takes new values of the pointer every 20

milliseconds to have a smooth movement. The actual point taken into account is the position of the

first finger which is the highest one on screen. First it checks the finger is actually above the hand's

palm in case if false fingers are detected. Then it checks the hand position as it mustn't be too much on

the left in the left hand area. If the actual point fails in one of these conditions the last pointer position

is taken instead. Otherwise, the point is processed. The new point isn't directly stored. It goes into a

smoother to increase the steadiness of the pointer. First the application used the Kalman smoothing


process. The results were not satisfying. The pointer movements were smooth but the delay between

the real movement from the user and the actual movement of the pointer was enormous. The smoother

was dropped. Instead the application uses a simple homemade smoother. It takes the six last positions

of the pointer and computes the average. It isn't perfect but it smoothes the pointer enough to be

accurate and it is instantaneous. If any problem occurs, the pointer takes always its last position and if

the latter doesn't exist the position (-1;-1) is taken. This position is outside of the screen so it can't be

wrong. Finally the current position is stored waiting for the HandTracker to get it.

7.4.2 The commands

The commands can be managed differently depending on the nature of the gestures. But in any case

the commands needs to fulfill its contract and returns its values like if the selection is active or the

rotation with its angle value or the zoom also with its value and the list of points. The list of points is

like the pointer. It gathers the left hand's highest finger position. It returns the last 25 positions which

allow displaying on screen the green trail to see more easily the flow of movement.

7.5 The layers

The layers are the bases for the levels. Each layer regroups its level structure, rules and operations. It

loads and creates its objects. It recovers the gestures recognition. It manages the objects' properties. It

paints the objects and the feedbacks. It measures the user's actions and writes them in log files at the

end. Most of the procedures are generic to all layers, this why a main layer generalizes the other layers

[FIGURE 36].

Figure 36: Layers classes structure

The layers have an identical structure for the levels. First it has a presentation with a title, a descriptive

text of the activity, an animated demonstration, legends and a button to start the test. Then between the

presentation and the test a countdown starts for 5 seconds. Then the activity begins. Finally when the

activity is done an end screen congratulating the user with a "well done" is displayed [FIGURE 37].

Presentation Countdown Test End screen

Figure 37: layers' levels structure


7.6 The objects

The objects in the application are called pieces. Each piece is a composition of simple base geometric

forms [FIGURE 38]. There is about half a dozen different types of pieces which is plenty enough for

the application. All types of pieces are heritance of the root class "Piece". The latter contains all the

necessary attributes and methods to select, move, rotate, zoom and display the whole composition. The

pieces have status. They can be active (gray), hover (blue) with the cursor over it for rotations and

zooms and finally selected (orange) to be moved. Let's talk about the basic pieces. For instance the

class "CenterRectangle" creates a rectangle with its main location in the middle instead of the corner

left. This rectangle has four corners. These corners will define if the cursor is hover the rectangle or

not with the dot product. It comes a little more difficult when the rectangle changes its size or rotations

around its center. All the corners have to be recalculated with trigonometric formulas. It comes really

tricky when the whole piece is a composition of base pieces. Then the center of rotation is only valid

for the central main piece. The other remote attached pieces have the same center of rotation which is

not their own previous one. It needs to compute the new locations of the remote pieces' centers for

every resizing and rotation, [FIGURE 38]. Otherwise the whole piece would be deformed. The pieces

also inform each other if the cursor is over them so it highlights the whole piece instead of just a part

of it.

Figure 38: Demonstration zoom and rotation on a composite object

7.7 The targets

In several levels of the applications, the objects have objectives to reach. These objectives need to be

clear and shown to the users. For tasks requiring shape, orientation or location changes, the objects get

along with target objects. Targets and objects have the same properties except targets can't be altered.

They just bring a visual support to the user. They are represented in the application in red wired style

to be clearly identified. The objects and targets are separated in two lists of "Piece" objects.


Figure 39: Target example

7.8 The feedbacks

The feedbacks are a major part of an application. The interface must bring to the user information of

his actions. The feedbacks have to be as clear and simple as possible, so the user understands directly

the impact of his actions without losing focus on what he's doing. The feedbacks mustn't take too

much attention from the user by overcrowding the space on the screen with too much information.

To help the user finding his mark with camera, at least at the beginning, a visualization of the hands is

really important. The candescent library provides contour visuals methods of what it recognizes. The

application uses this feature to display the hands on screen. First the two hands were displayed. But

after preliminary tests it results that the right hand visual feedback was more confusing the users than

helping them. It comes from the fact that the right hand controls the pointer which is represented by a

red aim. The right hand visual isn't attached to the pointer and is more static in comparison has it stays

in the right corner of the screen. In practice, it turns out users were looking more at the hand feedback

instead of the aim which led to pointing mistakes. So it was decided to remove the right hand feedback

and keep the red aim only which results more positively than with the two feedbacks. Only the left

hand contour feedback is displayed. Unfortunately, it isn't enough for the user to know exactly which

part of the hand is taken into account. So the application adds a green dot showing the real point of

capture, the highest finger. As the hand is moving, the green dot follows it and leaves a green trail of

smaller green dots [FIGURE 40] providing a feeling of movement which can be useful when making

circles.

Figure 40: Left hand feedbacks


It is hard for the user to know if he is too close or too far of the camera. A feedback is needed to

clarify the situation [FIGURE 41]. The feedback is divided in two parts. First the hand contour color

changes if the hand gets too close to a limit of detection. As there are two extremities, two colors are

used; orange if the hand is too close of the camera and red if it is too far. Using two different colors

allows the user to more quickly recognize the problem and fix it. Of course the user needs to know the

meaning of the colors. This is where part 2 takes place. With the hand color changing an additional

feedback appears. The feedback is an arrow showing the direction the hand should go on the z-axis.

The arrow points toward the user if he must move back his hand or it points toward the screen if he

must get closer. In addition a text is displayed above the arrow to clarify the situation. So at first the

user will check at the arrow and text to learn the color meaning but then he'll know exactly which

color stands for and won't have to look at the arrow and be more effective. If the user hand gets

beyond the detection areas, a "no hand" message blinks for the missing hand. Blinking messages

drives the user's attention more quickly.

Figure 41: detection limits feedbacks

The screen itself needs feedback to guide the user [FIGURE 42]. To avoid the user to move his hands

everywhere a detection area frame is always on screen. This frame avoids having the two hands near

each other which would results in bad detections. It also shows where the best zone for detection is.

Figure 42: Feedback of the frame for the detection area

Some text describes the frame purpose. Two little red areas in the top and bottom left corner move the

camera if the left hand points at them. Every time the user succeeds in a task a good message appears

for a few milliseconds in the top middle screen. Also to letting know where the user is in an activity,

information is displayed on the top right of the screen [FIGURE 43]. The information gives the name

of the activity, the number of the task and how many tasks there is. The application has buttons to

validate actions from the user like starting the new activity, skipping one if it takes too long to finish.


Those buttons are activated with pointer, not the mouse. A feedback of the button pressed is turned on

when the pointer is over it.

Figure 43: Text & button feedback

The feedbacks for the technologic gestures need to be simple to understand them immediately. The

gestures use horizontal and vertical movements. The feedback is a cross made out of two arrows going

both ways [FIGURE 44]. Some text describes its proper action at each extremity to remind the user.

There's a red square showing the neutral zone in which the hand can move without making any change

on objects. For the selection a simple blue line showing the area of selection is sufficient. When an

action is ongoing, a corresponding text appears at the bottom left of the detection frame.

Figure 44: Technologic feedbacks

The feedbacks for the iconic gestures are simple. The rotation and the zoom use the same technique.

The feedbacks are the same, but the user needs to differentiate them. So the application uses different

colors; purple for the rotation and orange for the zoom. The line between the center point and the

finger is getting larger with the distance. It makes it more perceptible for the user. The zoom has a

little "+" or "-" near the center point to show if it zooms in or out. The selection needs no feedbacks as

closing the hand is obvious and the contour of the hand makes it clear. Nevertheless, like the

technologic gestures, a text of the command is displayed at the bottom left of the frame.

Figure 45: Iconic feedbacks

The objects use different colors to make a distinction between each status [FIGURE 46]. They are four

status; normal, active, selected and success. There is a fifth color for the target objects. Each color is

dedicated to a status. Gray for inactive objects waiting to be awakens. Blue when the pointer is over

the object making it ready to be rotated, resized or selected. Orange when the object is selected and


ready to move. Objects turn green when they have reached their targets. The targets are white with red

borders. To simplify the object placements over the targets, the middle of each object has been marked

with a dot [FIGURE 47]. When objects have different orientation it might be difficult to know where

the object must go. With the middle dot no problem anymore, the user know exactly where it should

go.

Figure 46: Object's status color feedback

Figure 47: Object's middle dot feedback

7.9 The animations

Each activity is different. The user needs to know what will be his next task. Before the activity starts

a little descriptive page tell the user what is next. The description is a brief text. The description can be

as clear has it could a demonstration is more effective. So the application has a system to do some

animations. Animations can't be interacted with. They can have objects and pointers, text or any

graphic. The animation speed is a new frame every 40 milliseconds but it is adjustable. Objects' status

can change by selecting the options with the "setAllOptions" method. There's three possible ways to

do an animation. The first one is to use "Demostate" as a time reference. Each 40 milliseconds (as

default) Demostate is incremented. Move, rotate, resize or change the objects status incrementally for

definite time (demostate values). Second technique is to do an action until it reaches its goal. Like for

examples rotate an object of 45° with steps of 3° each iteration or moving an object to the right. It's

quicker to implement but less flexible. Last possible technique, is the whole manual method. This time

each object follows an array of animation. At each demostate increments, the object checks in the next

tile of the animation array to get its new attributes. The position, the size, the angle and the status of

the object must be indicated each time. It can take more time to do, but there's more flexibility and

precision in the animation.


7.10 The measures

The evaluation application needs measurements; otherwise it wouldn't be an evaluation application.

The data are recorded all along the evaluation. Each level has its own data recorded, hence it has its

own data measurement system also to be fit its needs. The log file of an activity is written at the end of

it. So, the data can't be saved in the file directly. To keep track on each measurement of each task, a

list of measurement is created. This list is composed of structures themselves composed of attributes.

Each task has its attributes stored in one structure added to the list. At the end the list is read to write

the log file.

Depending on the nature of the task, the type of gestures and the measurement needed, the monitoring

can differ. For example; the selection level needs to know the distance between the pointer and the

object. For this the measure must be taken on the activation of the selection command and not the

release. On the other hand, the move level wants to know the distance between the object and its target

after displacements. So the measure has to be taken on the release of the selection command to know

the final position. The iconic gestures have distinct commands that allow the measures to be taken at

the release of the gestures as they can't be confounded. The technologic gestures are based on a cross

system that provides the rotation and the zoom within the same activation. Here to count the rotations

and the zooms the neutral zone is used. Every time the left hand goes back in the neutral zone, it

measures the current command progression. The hitch is if for example a rotation goes too far and the

user wants to go back to rectify, he has to go to the other side of the cross by going through the neutral

zone and it will be counted as a finished movement. This doesn't happen with iconic gestures. A

solution would be to count one movement maximum per command if activated and check their final

values at release.

7.11 The logs

At the end of each activity level, a log file is created. This log file gathers all data acquired from the

measurements. To clarify files contains general information indentifying its nature. First it has the

application name on top. Then it has the user name, the date the file was created and the type of

gestures that was used during the test. The second part of the file reports the user's action for each

individual task specifically according to the activity. That means for the rotation activity for example,

only rotation are taken into account and reported into the log file. Other actions don't need to be

recorded has they are deactivated. Of course, the log file of the final level regrouping all the

commands contains every measurement. Nevertheless, at the end of the file, after all tasks have been

written with their times, number of tries and their specific measurements, general information about

the behaviors during the whole activity are written. It gives information like the average time for the


task, the total time of the activity, the number of time each commands have been activated and the

positions of the pointer during the test.

The files are located in the log directory of the application. They are separated in sub-directories, by

modality and then by activity. This ways it is quicker to regroup user evaluations by activity. In the

latest version of the application the filename is the user name + the name of the activity so it is easier

to regroup files of the same user.

7.12 About the applications

The system works well. It doesn't seem to crash for no reason. A lot of work was done to have it run

smoothly. Special care in the application has been done to give direct flexibility with editable levels,

camera adjustment and the choice of levels and gestures mode. Indirectly in the source code, it is easy

to add new kind of objects and new kind of gestures, change the settings of the rotation and move

levels or even activate the lock mode.

The lock mode is a system that once activated locks an object in active mode as long as the user

doesn't release its current commands. Without the lock mode, the pointer always has to be on the

object so the commands take effects. This can help is few occasion like the rotation where the pointer

can easily go out of the object and stop the rotation. It let the user focus on the commands and less on

the pointer. It has its flaws too. First the release of a command is more important to unlock the object

which can and had cause troubles as commands can still modify the object. Without the lock system,

the user can just point out of the object which will make it inactive and no errors can be made. It also

avoids trying to modify an object with no success when another is already activated.

Objects can overlap each other without any problem. This feature has to be implemented. Otherwise

without prior treatment of object's status and checks the first created object would always be selected

when objects are overlapping.

The application has been tested on three different machines with the same results. Nevertheless, it

remains a major issue. It doesn't come from the application directly but from the Candescent library. It

seems that the algorithm to detect the palm center of the hand gets null exceptions in certain

circumstances. Those circumstances couldn't be specifically clarified. It looks like it has more risk to

crash when hands are close to each other and when an outside objects like heads, breast, belly,

someone else or armrest of chairs are detect by the Kinect sensor. This bug might be fixed in the future

in new versions of the library. Otherwise the system is pretty robust.


The tutorial videos were made with the freeware Cam Studio 2.6 and the editing was done with

Windows Movie Maker 2.6.

7.13 The Class diagram

The class diagram shows the structure of the system. It gives an overview of the connection between

the classes. This diagram would be very helpful for someone who wants to get into the application and

understand its functioning to change it rapidly.

Figure 48: Application class diagram


Chapter 8

Evaluation

8.1 Condition of the evaluation........................................................... 70

8.2 Pre-evaluation................................................................................ 71

8.3 Range of testers............................................................................. 72

8.4 The questionnaires......................................................................... 72

8.5 Results........................................................................................... 73

8.6 Analysis......................................................................................... 89

Here is the evaluation chapter. First it describes the conditions in which the

evaluation took place and the kind of testers was evaluated. The important

results of the evaluation are described with graphs and commentaries. Finally,

the chapter ends with an analysis of the results to point out important

observation.


8. The Evaluation

8.1 Conditions of the evaluation

The goal of the evaluation is to see if a system of gesture recognition is better over the other one. The

evaluation has been done providing the same conditions to every tester. The same chair with armrests

at the same position was used. The lighting in the room was the same to provide an equal quality of

recognition due to the camera requirements. A dark room or an over lighted one decreases the

performances. The first part of the evaluation is to show and teach the users all the commands, the

interface of the application and some tips to perform better. This is done with three videos. The first

one shows the environment and the bases to not feel lost. The second and third videos, one for the

technologic gestures and one for the iconic ones, demonstrate the commands. Each video is a

demonstration of all the commands available and how to perform them. The videos are watch just

before the evaluation is taken. The second phase of the evaluation is the training. This part helps the

user gets to put in practice everything the videos showed. The user use this part to get his marks with

the recognition has a whole, the body placement, the arms, the hands and the head, which can be new

and disturbing for some people. The camera's vertical angle is adjusted to the user preferences to give

him the most adapted comfort. Once set, the user can get used to the pointing and commands. He can

train as long as he wants and do as much manipulations as he thinks he needs to be comfortable with

the environment. The training phase has a minimum time of 90 seconds. This minimum time, is set to

encourage the users to try the commands before the real tests. When the user is ready he just points at

the "next" button to begin the evaluation.

Each tester has to go through the tests twice, once for each type of commands. The Activities are

always done in the same order. First each command is tested separately from the others and in the final

level all commands are tested altogether. There are five activities; Selections, Moves, Rotations,

Resizing and Final (a compilation of the four others). Each activity is composed of repetitive tasks.

One evaluation has total of 43 tasks among the 5 evaluated activities.

Each test has a presentation page. These pages explain everything there is to know to accomplish the

tests. They have a text description and an animated demonstration of the tasks. They also remind the

users, if necessary, all the commands and the color significations of the objects. Indirectly, those pages

provide a pause between each test, so the users can rest their arms. To avoid a brutal start of an activity

a countdown of 5 seconds is set between the activity and the presentation page.

Within subjects is used for the evaluation [TABLE 11]. All subjects are assigned to all cases. To avoid

bias and the learning effect, as each test is performed twice, the testers are divided in two groups. The


first group does the technologic gestures to start and finish with the iconic gestures. The second group

does the opposite, it starts with the iconic gestures and it finishes with the technologic gestures.

# of users First experiment Second experiment

Group 1 5 Technologic gestures Iconic gestures

Group 2 5 Iconic gestures Technologic gestures Table 11: within subjects experiment progress

The evaluations were done at the same place in the same conditions. The user does his tasks alone. An

observer watches the operations from a distance. The observer can't be too close of the tester otherwise

the camera sees the observer and recognition fails. The observer only interferes if the tester is lost or if

there is technical problems and he needs help to go on by giving him small oral tips.

At the end of an evaluation of a type of gesture the user must fill a questionnaire about is perception of

the tests, the commands and his physiological fatigue. As the questionnaire was written in English and

some of the testers don't speak the language, the observer translates the questions for them and fills the

papers up.

The whole evaluation takes around 35-40 minutes to complete. Around 18 minutes of actual activities

to which it must be added 5 minutes of tutorial videos, a minimum training of 3 minutes (twice 90

seconds), time for the presentations of activities and the fulfillment of the questionnaires.

8.2 Pre-evaluation

During the development, pre-evaluations were carried out to see from an external point of view what

was done right, what was missing and more importantly what was done wrong. The evaluators were

given the task of going through the whole applications and give their impressions. First thing to come

out was the tasks were too hard with the commands. That made the evaluation too long and exhausting

to go through. One problem was the pinching for the selection. It was a bit too sensitive to be executed

properly and easily. The other command giving troubles to testers was the rotation's rope technique

absolute angle. It was hard for the tester to understand how the command operates. These two

commands were changed. Then each evaluation went from 49 tasks to 43 tasks by removing 2 rotation

tasks, 2 resizing tasks and specially by removing 2 final tasks which were really hard to execute and

took a lot of time. By doing so, it reduced the overall time by around 30%. The complexity of the

rotation with the rope technique was reduced. Instead of having the direct absolute value of the angle,

the value is always the closest value to a multiple of ten. Another issue was the confusion with the

pointer (red aim) and the right hand representation feedback. Then the detection area frame, the

camera adjustment, the object center point, etc came from observation and feedbacks of these pre-

evaluation tests.


8.3 Range of testers

Some pre-test were done with people that were technophobic which ends up with bad results. So, it

was decided to use testers used to today's motion recognition technologies or at least not afraid of

using them. All testers have at least used once a technology like Kinect, Wii motion controller or some

similar devices. They were all aged between 24 and 35 years old and were all male. It is not that

women were excluded from the evaluation. Women were tested during pre-evaluation. It just

happened none were found for the evaluation. The testers came from very different kind of fields.

Three were ITs, other were working in other domains not related to technologies.

8.4 The questionnaire

At the end at of each evaluation the testers had to fill a little questionnaire. The questions have been

inspired by the table C.1 Independent rating scale from the ISO 9241-9 document. Some questions

were kept unchanged or modified, others were removed and new ones were added. The look and

structure was also inspired by the C.1 table but also by the C.2 table. Finally a third original and final

design was chosen. The questionnaire is qualitative. Each question is rated from 1 to 7. The user

expresses his sensation and perception of the system by giving grades. The questionnaire is divided in

two parts. First part is focusing on the commands. The questions try to cover all aspects from the

general comfort, the precision, the ease of the use to the feedback quality. The second part is all about

the fatigue. It tries to figure out which part of the body is exhausted and the overall fatigue.

Figure 49: Questionnaires for evaluation


8.5 Results

The data are now recorded in the log files. They need to be extracted and rearrange into tables to be

transformed into information. To analyze the data, Excel and R are used. Excel to have a clearer view

of the data in tables and to have quickly nice simple graphics and results. R is used create more

sophisticate graphs harder or impossible to do with Excel like the box plots, density curves and

distributions.

Most of the information below is based on time to try to figure out if a technique is better than the

other. The structure of the results is almost the same for each activity. First, it compares the average

time of the tasks and the activity to see which performed best. Then to check the condition for the

statistical test, it needs some numerical results like the standard deviations, variances, quartiles, etc.

Along come some graphs like the data repartition, the densities of the data and the distribution curves

to have a better view of the situations.

Statistical tests, like the student test, are done. The two-sample t-test is one of the most commonly

used hypothesis tests. It is applied to compare whether the average difference of means between two

groups is really significant or if it is due instead to random chance. It states the null hypothesis that the

means of two normally distributed populations are equivalent. The t-tests have been conducted with R

and Excel. An internet site6 also gives the opportunity to do the test simply and provides complete

results with analysis. When comparing the three methods, little differences can be noticed, not on the

t-test but on the calculation of the quartiles. These little differences don't make a difference on the final

decision whether or not reject the null hypothesis. Paired t-tests were used as it is one group of units

that has been tested twice. The internet provides useful information like this little questionnaire7

(Appendix Correction Welch). It guides people to choose the right statistical test according to their

data.

8.5.1 Selection

Times overlook by tasks

Let's take a look at the learning curve for the selection. On the next four graphics [GRAPH 1 & 2], we

can see a clear difference of time between the 3 to 4 first tasks and the next one. With the two type of

gestures, it takes around three times adopt the command. Then the curve stabilizes with less variation.

The iconic selection is better for more than 80% of the time. We can see a pick for the iconic curve for

the fourth tasks. It comes from two testers who took around 11 seconds instead of the usual average of

6 http://www.physics.csbsju.edu/stats/ 7 http://biol09.biol.umontreal.ca/BIO2041e/Correction_Welch.pdf


3 to 5 seconds for this particular task which brings the mean way up. Otherwise the iconic smooth

curve shows predisposition to have better performances than the technologic one.

Graph 1: Average time for each task from Technologic and Iconic side by side

Graph 2: Selection: average time's comparison by task

Times overlook by testers

Let's take the average time of the selection activity of each the testers and put them against each other

[GRAPH 3]. We can notice that are close in most cases between technologic and iconic. It is

interesting to remark the crossing of lines between tester 5 and 6.As the tester 1-5 starts with

technologic gestures and testers 6-10 begin with the iconic ones, it looks like the second walkthrough

for the selection activity makes better results, except for tester 10.

Graph 3: Selection: average time's comparison by tester


Here's a quick summary of the data. The average time for the iconic selection is around 400

milliseconds better than its counterpart. The standard deviation is also smaller with iconic which can

be consider as a more concentrated distribution around the mean. It tells us that 65% of the testers

succeed their tasks between 3.35 seconds and 5.18 seconds against 3.3 and 5.94 seconds for

technologic.

Technologic Data Set Summary:

10 data points Mean = 4.621 Low = 2.95 High = 6.99

First Quartile = 3.99 Median = 4.425 Third Quartile = 4.69

Standard Deviation = 1.32 Variance = 1.742

95% confidence interval for actual Mean: 3.68 thru 5.562

Iconic Data Set Summary:





Histogram:

The histogram [GRAPH 4] shows us the repartition of the data. The two histograms overlap each other

to notice differences. The data are close to each other. The technologic repartition is more spread

which results in a flatter distribution curve. The iconic distribution curve is narrower as the data are

more concentrated.

Graph 4: Histogram + Distribution curves + densities


Means comparison:

This box plot graph [GRAPH 5] shows us quickly that the data are really close to each other. Even

though, there is a small advantage for iconic, as the mean is lower and the data are more concentrated

around it with less far outliners than technologic.

Graph 5: Box plot Selection Technologic and Iconic

Statistical T-Test:

The t-test uses paired data as parameters. It resulted with a p-value of 0.48. The null hypothesis is

rejected. There are not enough statistical evidences to conclude the data comes from the same

population and that there is a significant difference between the two selection gestures [TABLE 12].

Table 12: Selection: times t-test table

Tries and errors

By looking at the graphic [GRAPH 6] and table [TABLE 13] the iconic number of try required to

achieve a task is inferior to the technologic one but the difference isn't significant enough to make a

difference with the t-test.

Graph 6: Selection: number of tries by testers

Table 13: Selection: tries t-test table


Effective index of difficulty

The Fitts's law is used to measure the difficulty and the performance of the users during the activity.

"This is the measure, in bits, of the user precision achieved in accomplishing a task."8 The Shannon's

formula is used for the effective index of difficulty (IDe) [FIGURE 50]. is the distance to the center

of the target, in this case 400 pixels. is the effective size of the target. The effective size is

calculated according to the standard deviation of the selections distances to the center of the target

multiplied by a constant of 4.133. This adjusts the target width assuming 96% of the users hit the

target within the range. The results return average throughputs of 0.87 bps for technologic and 0.93

bps for iconic. The difference is small and insignificant. These are low results considering the indexes

of difficulty [TABLE 14] were inferior to 4 which keep the tasks precision category to low (between 1

and 4). One reason is the time it takes to activate the selection commands and more important the time

to release them.

Figure 50: Index of performance formulas

Table 14: indexes of difficulty with average throughputs

8.5.2 Move


Like the selection activity, moving objects is quick to learn. We can see on the graphics [GRAPH 7 &

8] the average time per task drops rapidly after the first two one. Then the times are pretty similar with

an average time of 5.47 seconds for technologic and 4.85 for iconic. To check if the difference of time

is significant, a T-test has been done. The outcome gave a p-value of 0.059 which is very close to the

5% limits. A little too much to accept the null hypothesis, but close enough to not denied the

performance with iconic gesture is better.

8 ISO 9241-9, B.5.1.3 Effective index of difficulty


Graph 7: Average time for each task from Technologic and Iconic side by side

Graph 8: Move average time's comparison by task


If we look at the data by testers, the graphics [GRAPH 9] show us that 70% of the testers performed

better with iconic. The average time also gives a clear advantage of using iconic over technologic with

mean of 4.87 seconds against 5.48 seconds respectively.

Graph 9: Move average time's comparison by tester












Histogram:

The data repartitions [GRAPH 10] overlap well over each other. Even the densities look alike.

However the technologic distribution is flatter and lower than the iconic ones which reveal a large

difference of the standard deviations.


Means comparison:

The box plot [GRAPH 11] shows us the medians are practically equal, but the iconic data are more

concentrate then the technologic ones. One outliner was detected for each type.

Graph 11: Box plot Move Technologic and Iconic

Statistical Test:

The data is consistent with a log normal distribution: P= 0.37 where the log normal distribution has

geometric mean= 5.491 and multiplicative standard deviation = 1.342. The iconic data is consistent


with a log normal distribution: P= 0.61 where the log normal distribution has geometric mean= 4.892

and multiplicative standard deviation = 1.231.

The T-test returns a p-value of 0.27 [TABLE 15] which too high to accept the null hypothesis. There is

not a significant difference between the two means.

Table 15: Move: times t-test table

Tries and errors

By looking at the graphic [GRAPH 12] and the table [TABLE 16], the iconic number of try required

to achieve a task is inferior to the technologic one but the difference isn't significant enough to make a

difference with the t-test.

Graph 12: Move: number of tries comparison

Table 16: Move: tries t-test table

8.5.3 Rotation


Iconic rotation has clear advantage with the first tasks but the difference is reduced over time to get

practically equal [GRAPH 13].

Graph 13: Rotation: average time's comparison by task


The iconic rotation is a clear winner when it comes to compare times by testers with 90% of better

performances [GRAPH 14]. The iconic average time also is way under the technologic one with a


difference of 3.5 seconds. 75% of the testers finished their tasks under 17.5 seconds against 20

seconds with technologic.

Graph 14: Rotation average time's comparison by tester











Histogram:

The technologic results show on the histogram [GRAPH 15] the data are more spread than the iconic

ones. The curve distributions verify the data are more concentrated for the iconic ones with a smaller

standard deviation.



Means comparison:

We can see a clear difference between technologic and iconic with the box plot graphic [GRAPH 16].

Even with two outliners the technologic rotation can't compete.

Graph 16: Box plot Rotation Technologic and Iconic

Statistical Test:

The technologic data is consistent with a normal distribution: P= 0.33 where the normal distribution

has mean = 18.12 and standard deviation= 7.095

The iconic data is consistent with a normal distribution: P= 0.52 where the normal distribution has

mean= 14.80 and standard deviation = 4.878.

The t-test returns a p-value of 0.005 which is very confident to accept the null hypothesis. There is a

significant difference between the two means. The iconic rotation performs statistically better [TABLE

17].

Table 17: Rotation: times t-test table

Tries and errors

By looking at the average number tries by tasks, we can see the learning curve [GRAPH 17]. The

techno curve shows the reduction of errors over time. The testers seem to gain confidence. The iconic

curve is more stable as the command counts less activations with the rope technique than with the

cross system. We can't assume much from this curve.

Graph 17: Rotation: number of tries comparison


8.5.4 Resizing


What is clear on these graphics [GRAPH 18] is that once again iconic performed best. It seems that

the difficulty of the tasks was equal for the two types of zooms as the curves follow the same kind of

path.

Graph 18: Resizing: average time's comparison by task


Not as obvious as the rotation results, the iconic zoom still performed best for 80% of the testers

[GRAPH 19]. The average time marks a clear disparity between the two with 2.3 seconds of difference

in favor of iconic with 12.3 seconds and 14.6 respectively.

Graph 19: Resizing: average time's comparison by tester












Histogram:

Like with the rotation results, the iconic repartition of the data is more compact and less spread

[GRAPH 20].


Means comparison:

Obviously the iconic results are way better than the technologic ones on this box plot graphic

[GRAPH 21]. However, it is important to notice R considered two good results for technologic as

outliners which increased a little the difference between the two gestures.

Graph 21: Box plot Resizing Technologic and Iconic

Statistical Test:

The technologic data is consistent with a normal distribution: P= 0.55 where the normal distribution

has mean= 14.70 and standard deviation = 3.648.


The iconic data also is consistent with a normal distribution: P= 0.94 where the normal distribution has

mean= 12.34 and standard deviation = 2.309.

The probability of this result, assuming the null hypothesis, is 0.030 [TABLE 18]. The two samples

come from the same population with a probability of 97%. The null hypothesis is accepted and there is

a difference between the two means. Iconic zoom performed better with this panel of testers.

Table 18: Resizing: times t-test table

Tries and errors

The learning curve is less convincing as the previous rotation tries learning curve [GRAPH 22]. Still it

shows some improvements when we compare the first 4 tasks and the last 4 tasks.

Graph 22: Resizing: number of tries comparison

8.5.5 Final


The time taken for each task varies a lot between testers. However the second task has slightly better

results overall compared to task 1 and 3 with the two types of gestures [GRAPH 23].

Graph 23: All times for each task from Technologic and Iconic side by side


If we look at the data by testers [GRAPH 24], we can't see a clear trend for any gestures. Even the

number of best scores is 50-50. There isn't any sign that the second walkthrough was better than the


first one. Even the volumes are identical. The mean are practically equals with 1 seconds of difference

in favor to technologic at 54.3 seconds.

Graph 24: Final: average time's comparison by tester











Histogram:

The histogram [GRAPH 25] shows similar repartitions, densities and more important identiqual

distribution curves. The box plot shows differences but it seems to be duw to the considered outliner

of iconic.



Statistical Test:

The t-test finally shows that the two groups are identical up to 80% and there is no difference between

them [TABLE 19].

Table 19: Final: times t-test table

Tries and errors

Testers take around 9 commands with iconic gestures and 15 with technologic gestures to achieve a

task requiring a movement, a rotation and a zoom [GRAPH 26]. A really optimistic goal would be to

achieve a task with three commands. That is almost impossible, because it would meant no mistakes at

all. The difference is significant but as the command counting systems of the two gestures types differ

from one another, they can't really be compared. Nonetheless, it can be noticed on the table [TABLE

20] that it takes a lot more rotations than the other command with technologic gestures.

Graph 26 Final: number of tries comparison

Table 20: Final: tries and errors t-test table

8.5.6 Summaries

Here is an overview of all activities. It brings quickly information to compare the statistics against

each other. The first summary [TABLE 21] regroups the stats of the time performances. The second

summary [TABLE 22] gathers the number of errors made by the users during the evaluations. It is

important to note that errors are counted as the total number of commands done during the activity

subtract to the minimum commands needed to achieve it. Nevertheless, the results of error rates must

be taken with great caution. Indeed, as mentioned previously the way the two type of gestures count

the number of commands differs from one another, specifically the rotation and zoom commands. This

brings huge differences due to this nature and must be considered.


Performances summary

Table 21: Activities summaries table

Errors summary

Table 22: Activities errors summaries table

8.5.7 Questionnaire

Here are the results of the questionnaires [TABLE 23]. The scale for each question goes from 1 to 7.

The data is divided in two sections. The total means of every question is compared. In addition, two

columns give the results from the groups first and second attempts. The idea is to check if a significant

difference exists between the first walkthrough and the second one. That is interesting to see if the

fatigue perception grows over time. Statistical t-tests are done to check if the differences are

significant or not.

First we can see that testers thought the techno gestures required less effort and were smoother than

the iconic ones. The difference is small and there is no evidence the perception of the testers proves

actual facts. But the iconic gestures are more accurate with a huge difference that is statistically

significant. The testers seem to have preferred all the commands from iconic, specially the selection

with a big significant difference. Otherwise the results are close from one another. Important to note

testers liked the technologic feedbacks better than the iconic ones. They found them more clear with

more useful information. Overall the results are average with no real huge differences between the two

modes, except the selection and the feedback quality. The difference between the first and second

attempt doesn't show anything. Testers didn't seem to give better grades any general on the first or

second walkthrough.


On the fatigue side, the techno gestures seem to be less fatiguing. This goes well in agreement with the

second question of the form about the effort required for operations. But overall the general fatigue

was practically equal. The real deal that stands out is the arms fatigue. Every tester complained about

it. It makes no differences between technologic or iconic. The testers' arms were exhausted way more

than the others. Finally if we compare the first attempt with the second, the fatigue from the first

walkthrough can be feel in the second one as the testers are almost always a little bit more tired.

Table 23: Questionnaires' results table

8.6 Analysis

This analysis is divided in several parts. First it will talk about general observations on comments and

behaviors of the testers during the evaluation. Then it will analyze the quantitative and qualitative

results of each command separately. And conclude with a synthesis of the overall performances.

First of all, users need practice to understand, remember and use correctly the commands. The

environment of the application is new and users have to find the marks. The style of gestures is

particular and not very common. The use of the right hand just for pointing and the left hand

exclusively to execute some commands can be perturbing and confusing at times. Users also have to

know where their hands must stand for a good recognition. All these little inconvenient are reduces

over time. The reason to have put a training level was to let users having their hands on all the

commands without being tested and to mainly not be totally lost in the first evaluated activities. And it

worked. Users found their marks, played with the objects and learn how the system is working. It is

also interesting to notice users don't do the gestures the same way. For example, the pointing: some

users point with just the index finger which results in a smooth and accurate movement. Others use the

whole hand with all the fingers outstretched for pointing. This way the pointer shakes more and it is

harder to be precise. Others have the palm of their hand facing the camera which works pretty well but

can be a little harder to be accurate.


The selection level is an easy activity with simple identical tasks. The distance between the targets is

always the same, the size is also equal, just the direction the pointer has to move changes. One

repeating mistake has been observed several times. Sometimes users forgot to release the object before

going to the next and lost a few seconds wondering why the task wasn't validate. They had indeed the

obligation to select the object and then release it with the pointer still on the object. The iconic

selection can only be used one way but the technologic doesn't. Usually users put down the hand by a

vertical movement of the whole arm. One user used his elbow instead to reduce the effort. He rotated

his front arm from vertical to almost horizontal until it reached the selection area like pushing a button.

It worked well. From a statistical point of view, none of the two selections do stands out from the

other. Even though, the iconic had a better average time, 10% lower than the technologic one, with

fewer outliners and a better median, the t-test doesn't reflect the advantage. 56% of the tasks were

performed faster with the iconic selection command and 60% of the users finished the activity quicker.

The Move activity also uses the selection command. We'll see if it shares the same conclusion as the

selection activity which was that there were no differences between the two commands. First, some

testers liked this activity. They found it fun and it was their favorite. A few little missteps happened

though. In iconic mode, some users tried to use their left hand to move the objects forgetting it was the

right hand. Despite the fact it was specified in the videos, some users tried to do very quick sudden

movements which result badly. They understood rapidly they had to go slower. Otherwise the activity

went smoothly. On the quantitative side, it is the same as with the selection. The overall statistics are

better with iconic by 10%. The standard deviation is 40% smaller with 0.89 versus 1.44. Even the

number of tries is smaller with iconic by 10%. But the t-test makes it clear: there is no difference

between the two methods just as the selection.

So on the overall performance of the first two activities that evaluate the selection commands, there is

no significant difference between the two methods. But by looking at the qualitative results, it is clear

that the iconic selection is an obvious winner.

The rotation activity is harder than the first two activities. It requires being more accurate. In this

activity as well as in the next one, resizing, the two commands differ by the type of gesture it requires.

The technologic rotation is controlled over a horizontal sliding movement and the iconic rotation

requires circular movement from the user. The technologic and iconic rotations approaches are very

different. Technologic uses automatic speed to rotate objects and iconic waits on the user movement to

increase or decrease the rotation. With technologic, users were a little disturbed by the variable speed


of the rotation. They found it hard to manage it correctly. The technique used to get the object in the

right orientation could be different between users. Some used the neutral zone as it was designed for to

stop the object's rotation. Others used the pointer as regulator by pointing out of the object to stop it.

Let's note that it wouldn't be possible to use this last technique if the lock mode was activated. For the

iconic rotation there is only one way to execute the command; make circles. Both techniques had one

same issue; the pointer going out of the object as it rotates. The user has to carefully aim near the

center of the object or readjust the pointer during the operation. This costs some concentration of the

user. The results are pretty simple. The iconic rotation performed better. All statistics point out the

superiority of the iconic gesture. The average time is better by 3.5 seconds, the median wins by 2

seconds, 90% of the users performed better with iconic and finally the t-test confirms the significant

difference of means by accepting the null hypothesis by less than 1%. Qualitatively users favored the

iconic gestures but the difference is very small, 4.2 against 4.3, and it isn't significant. The users liked

the simplicity and the feedback of the technologic rotation, except the variable speed which they think

is not a bad idea but requires more practice. They also liked the fact it doesn't require lots of effort to

make it work. On the iconic side they found more entertaining to use, easier and more natural than the

technologic rotation gesture. They found the feedback a little shallow they would have liked to see

more information.

The resizing activity is very close to the rotation activity. Even the gestures are similar to the rotation

ones. The technologic zoom still uses a continuous movement controlled over a vertical slide. Iconic

zoom still has to make circles around a center point but the activation posture is different from the

rotation activation posture. So the comments about rotation still stand as well. The main difference is

that this level deals with objects of different sizes. Users found it harder to point at smaller objects. It

takes more effort, more concentration to be more accurate and requires being more stable. This goes

for both modes. The two methods performed better than the two rotations on average, especially

technologic. This improvement may come from the learning of the previous similar activities.

However, the iconic zoom gesture is still ahead from the technologic zoom just like with rotation. The

differences between the two modes are not as obvious as before. Still, iconic is 80% of the time faster

than technologic. The means are closer with 12.3 seconds for iconic versus 14.6 for technologic but

iconic keeps a clear advantage. Finally the t-test verifies the null hypothesis and accepts it as there is a

significant difference between the two means. Iconic zoom performed better with this panel of testers.

The users still preferred the iconic gesture for the ease of use. But interestingly, users liked more the

zooms over the rotations by at least 0.5 points. The reason may be the users gained skills and had less

difficulty to pass the tasks and their perception made it better.


The final activity was easily the hardest activity of them all. It regroups all the commands at once.

Every command has an impact on the objects. The order the commands are done is up to the users.

They actually didn't all use the same order. Some started by moving the object to its target. Others

resized and rotated them before moving them. Even though the commands reminder was showed right

before the activity, most users had forgot partially how to do some commands. The difficulty was a

little high in their opinions. The precision required for the three commands combined was too high.

This activity was the most exhausting, physically and mentally as it demands more concentration to

manage the three commands. In both cases, users were happy to get over with it and that it is wasn't

any longer. Iconic gestures performed better in almost all previous activities with more or less

significance. It would be expected that the final activity follow the same path. Actually, it doesn't. The

technologic gestures catch up with the iconic gestures a give a very close score. It even has a slightly

better overall average time by 1 second. Except that the results a very similar. Even the distributions

curves almost overlap each other. Even the statistical gives an important p-value of 80% which means

the two methods are equal. If we try to find reason, some elements from the users' comments came

out. First, on the iconic side, users had to switch between commands. Some users got confused and

mixed with the rotation and zoom activations which require one or two fingers. On the other hand,

technologic rotation and zoom have the same activation and the feedback remind directly how to use

the two commands at the same time.

To summaries, iconic gestures have better or at least equal results compared to technologic when they

are used individually with no perturbation from the other commands. To select and move objects the

quantitative tests don't show any significant difference even though the overall are slightly better for

iconic. Difference is more qualitative. Users liked way more the iconic selection with an outstanding

6.5 out of 7 versus 4.4 for the technologic selection. Closing his hand like grabbing an object is

something that works pretty well. It is simple to use and to remember. Even technically it is simple to

detect with a really good accuracy. It hasn't a specified area for activation. Even though, it doesn't

mean the selection of technologic is bad. It also works pretty well and is simple to use and detect. It

just doesn't add up with the iconic selection. For the rotations and zooms, it is quite the contrary. The

user preferred the iconic gestures with a tiny insignificant advantage on the qualitative side. But the

quantitative results show a clear benefit to use iconic gestures to perform rotations and zooms. But

when it comes to use all the commands together the two methods seems to be equivalent. There is no

evidence one method is better than the other. Out of the ten testers, two of them had their best results

in all the five activities with the iconic gestures. With technologic, only one tester almost did the grand

slam with four out of five activities but add an overall success of near 70%. In total, when it comes to

count every tasks technologic loses with 44% of success versus 56% for iconic.


Chapter 9

Extras Applications

9.1 Gesture Factory Application......................................................... 94

9.2 Bing Map Application.................................................................. 94

Chapter 9 is about everything that is out of the final evaluation. It treats the two

other applications developed during the project that were not part of the final

evaluation.


9. Extra applications

9.1 Gesture Factory application

This is the first application done in this project. Its goal was to design, develop and test quickly

gestures and objects using kinect and the candescent library. The application has no goal of being

finished, polished or delivered to anyone. It's just to try and retry things over and over again. All the

twenty gestures described in this document are in the gesture factory. Some adjustments have been

made in the test application, which regroups only the used gestures for the evaluation, to improve the

quality, the simplicity and the robustness. The application is pretty simples. It has a screen in which a

layer can display information, buttons to switch quickly from a type of display, recognition or both to

another. This gives a quick way to compare designs.

9.2 Bing Map application

The Bing map application had for purpose to use the different commands in real situation. The

application proposes to browse the entire world using a Bing map applet in a .Net environment. The

NUI used in the test application is mapped to the mouse controls. The right hand controls the mouse

pointer and the commands recreate mouse's action like the click, double-click, wheel roll, etc. The user

can navigate within the map and perform zooms. The idea was to integrate this application at the end

of the usual tests. The user would have had to execute a scripted scenario like finding the Eiffel tower

in a minimum of time or/and browsing freely and give his impression. The application was dropped

early in the development due to the time required for the tests. Indeed, the previous evaluation takes

around 15-20 minutes twice, one for each modality, without counting additional stuff like videos and

questionnaires. Adding another test would have made the whole session too long. Nevertheless, the

application is running and the user can use kinect to navigate the map. Some commands don't quite

work perfectly like the detection of false double-clicks. It is also important to note that the commands

done with kinect replacing the mouse controls apply not only in the application but on the all operating

system. So the users have to be careful. The real mouse device can't be use at the same time as long as

the application is running. This may cause lots of problems. The application needs for development

time to be usable properly. If it is well calibrated, it also gives options to use kinect as a default full

input device replacing the mouse.

Figure 51: Gestures factory & Bing map application


Chapter 10

Conclusions

10.1 General comments over the development.................................... 96

10.2 Conclusion.................................................................................... 96

10.3 Future Work.................................................................................. 97

This is the final chapter. It brings the conclusion of the project and also some

insight over the development during the project. Finally, it ends up what could

be done and improved in the future with this project.


10. Conclusions

10.1 General comments over the development

The project took a long time to finish. Lots of phases were involved; learn the technologies, design the

multiple gestures, implement these gestures, test them, create a full application for tests, design

feedbacks, do pre-evaluations, redesigned the gestures, do the final evaluation, collect the data,

analyze them and finally write the thesis. Some phases were done quicker than others. But what took

most of the time was the implementation of the application test and tweaking the gestures to make

them robust and simple to use as possible. The hardest part was to manage the balance of difficulty,

the release of the commands and small variation in the recognition like losing a finger or the hand for

a few frames. If the release is too easy the command won't hold and if too stiff the command will

never be deactivated. For a more robust recognition tricks must be used, like using history of status

and timers to avoid involuntary sudden change of states. The point is to do all that without reducing

too much the reactivity of the system. Controlling the commands by voice recognition and gestures

was considered at the beginning of the project. The idea was drop pretty early as the voice recognition

system was a little bit slow to recognize words and because it would have meant another walkthrough

the five tests. The all evaluation would have been too long. The project paper tries to gather important

information but more data, graphs and information are available in the Excel and R files provided with

the project.

10.2 Conclusion

Natural user interfaces become more part of our everyday life devices. It is important to find good

designs for gestures that are well adapted to their purposes. The main goal for NUI is to be efficient

but also to be simple and liked by a majority of users. If the users don't like the way controls works,

they won't adopt the system and they will go somewhere else. The point of this project is to try to

figure out if a type of gestures is better against others. Around 20 gestures have been designed and

implemented. They were put in two groups of gestures; the technologic group supposed to be close to

the machine to be effective as possible and the iconic group which gathers gestures that are believed to

be more natural. Each gesture is designed for one of the four commands; the selection, the drag and

drop, the rotation and the zoom. During development some recognition techniques prevailed over

others. Like for instance, the rope technique for the rotation and the zoom which was better than the

circle detection. Checking the angle between a fix point and compare it with previous ones is quicker,

more responsive, more accurate and more versatile than recognizing circles in a path of position points

which is slow, not responsive enough, hard to detect and requires precise movements.


All gestures couldn't be evaluated in the final phase of the project due to time it would have taken. The

application of this project has been designed to test and to evaluate each command with different

gestures. It provides different experiences to users in which the application can monitor the behaviors

and the progressions to afterward analyze the data. For most of the users the experience was good but

exhausting. They practically all complained about the fatigue felt in their arms. Most of them think

they could improve their performance with more practice. The evaluation provided interesting results.

It turns out that iconic gestures generally performed equally or slightly better but they are in majority

favored by users. It was especially and significantly better for rotations and zooms according to

quantitative results and student t-test. Not that the technologic gestures weren't good but they didn't

meet their audience.

Finally, using the left hand to extend the number of commands of the right hand without decreasing

the pointer precision worked well. It demands users a certain adaptation as they switch focus on the

right hand or the left hand at first but then it fades away over time and practice. When it is possible,

giving the possibility to choose the method to achieve a task would be the way to go and not imposing

one. The feeling of freedom of choice is always well received by users who will felt less constraint.

However if one type of gestures should be taken it would obviously be the iconic one.

10.1 Future work

To take this project further, several directions are possible. A first idea would be to use two Kinect

devices instead of one. The first one is facing the user and the second one is on the side perpendicular

to the user to be able to capture lateral movements. This idea was suggested at the beginning of the

project but was finally dropped. Another idea would be to just use the left hand postures as an

activator of commands. The right hand would manage the rest. This would let the user focusing mostly

on the pointer and not both. Secondly the hands would have fewer risks to be close to each other and

disturb the recognition as the left hand could stay static. Thirdly, the left arm would be less tired. Of

course a lock system on the objects would be necessary as the pointer would be moving during

operations.

The applications could use some new levels and objects. Speed and endurance tests could be added. It

would count the number of selections users can do in one minute for example or the number of full

rotations. The results could then be compared with other gestures or modalities to see the

effectiveness. The Move level could be changed in the way that the target zone area would be reduced

to a simple circle. This way the data recorded could be used to compute the index of performance and

compare the results with other drag and drop activities. Some improvement can be done around the


editable levels like adding new options. Levels could have scores and multiple objects on screen like

the training but the evaluation would have to be completely redesigned. The rotation could be done

differently. For now, the rotation is always done at the center of the object. An improvement would be

to rotate the object around the pointer position. Some sounds could be added to provide some quick

simple feedbacks.


11. References

1. MSc Surveying, Daniel Binney, Jan Boehm (Supervisor), Performance Evaluation of the PrimeSense IR Projected

Pattern Depth Sensor UCL Department of Civil, Environmental and Geomatic Engineering, Gower St, London ,WC1E

2. Jarrett Webb and James Ashley, Beginning Kinect Programming with the Microsoft Kinect SDK, 2012, Apress

3. http://candescentnui.codeplex.com

4. http://hci.rwth-aachen.de/tiki-download_wiki_attachment.php?attId=1508

5. http://www.primesense.com/technology

6. http://www.renauddumont.be/fr/2012/kinect-pour-windows-vs-kinect-pour-xbox

7. http://en.wikipedia.org/wiki/Kinect


Appendix

Candescent License New BSD License (BSD)

Copyright (c) 2011, Stefan Stegmueller All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Candescent.ch nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR

IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND

FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR

CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL

DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,

DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER

IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF

THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Correction Welch

http://biol09.biol.umontreal.ca/BIO2041e/Correction_Welch.pdf

1. Sample sizes equal?

Yes go to 2

No go to 6

2. Equal sample sizes. Distribution:

Normal go to 3

Skewed go to 5

3. Normal distributions. THV result:

Variances homogeneous use any one of the 3 tests (most simple: parametric t-test).

Variances unequal go to 4

4. Variances unequal. Sample size:

Small use the t-test with Welch correction.

Large use any one of the 3 tests (most simple: parametric t-test).

5. Skewed distributions. THV result:

Variances homogeneous all 3 tests are valid, but the permutational t-test is preferable because it has correct type

I error and the highest power.

Variances unequal normalize the data or use a nonparametric test (Wilcoxon-Mann-Whitney test, median test,

Kolmogorov-Smirnov two-sample test, etc.).

6. Unequal sample sizes. Distribution:

Normal go to 7

Skewed go to 8

7. Normal distributions. THV result:

Variances homogeneous use the parametric or permutational t-tests (most simple:parametric t-test).

Variances unequaluse any one of the 3 tests (most simple: parametric t-test). Power is low when the sample

sizes are strongly unequal; avoid the Welch-corrected t-test in the most extreme cases of sample size inequality (lower power).

8. Skewed distributions. THV result

Variances homogeneous use the permutational t-test.

Variances unequalnormalize the data or use a nonparametric test (Wilcoxon-Mann-Whitney test, median test,

Kolmogorov-Smirnov two-sample test, etc.).

USING MICROSOFT KINECT SENSOR TO PERFORM COMMANDS … · Microsoft Kinect device is at the moment...

Documents

Transcript of USING MICROSOFT KINECT SENSOR TO PERFORM COMMANDS … · Microsoft Kinect device is at the moment...