The Sixth Sense Final

8/2/2019 The Sixth Sense Final

1/19

The Sixth Sense: Gesture Recognition

Department of CS&E,PESCE, Mandya Page 1

Introduction

Gesture recognition is a topic in computer science and language technology with the

goal of interpreting human gestures via mathematical algorithms. Gestures can originate

from any bodily motion or state but commonly originate from the face or hand. Current

focuses in the field include emotion recognition from the face and hand gesture

recognition. Many approaches have been made using cameras and computer vision

algorithms to interpret sign language. However, the identification and recognition of

posture, gait, and human behaviours is also the subject of gesture recognition

techniques.Gesture recognition can be seen as a way for computers to begin to understand human body

language, thus building a richer bridge between machines and humans than primitive

text user interfaces or even GUIs (graphical user interfaces), which still limit the majority of

input to keyboard and mouse.

Gesture recognition enables humans to interface with the machine (HMI) and interact

naturally without any mechanical devices. Using the concept of gesture recognition, it is

possible to point a finger at the computer screen so that the cursor will move accordingly. This

could potentially make conventional input devices such as mouse, keyboards and even touch-

screens redundant.

Gesture recognition can be conducted with techniques from computer vision and image

processing.


2/19



Interface with computers using gestures of the human body, typically hand movements.

In gesture recognition technology, a camera reads the movements of the human body and

communicates the data to a computer that uses the gestures as input to control devicesor applications. For example, a person clapping his hands together in front of a camera

can produce the sound of cymbals being crashed together when the gesture is fed through

a computer.

One way gesture recognition is being used is to help the physically impaired to interact

with computers, such as interpreting sign language. The technology also has the potential

to change the way users interact with computers by eliminating input devices such

as joysticks, mice and keyboards and allowing the unencumbered body to give signals to

the computer through gestures such as finger pointing.

Unlike hap tic interfaces, gesture recognition does not require the user to wear any special

equipment or attach any devices to the body. The gestures of the body are read by a

camera instead of sensors attached to a device such as a data glove.

In addition to hand and body movement, gesture recognition technology also can be used to read facial

and speech expressions (i.e., lip reading), and eye movements. The literature includes ongoing work in

the computer vision field on capturing gestures or more general human pose and movements by

cameras connected to a computer.
http://www.webopedia.com/TERM/I/interface.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/J/joystick.htmlhttp://www.webopedia.com/TERM/M/mouse.htmlhttp://www.webopedia.com/TERM/H/haptic.htmlhttp://www.webopedia.com/TERM/D/data_glove.htmlhttp://www.webopedia.com/TERM/D/data_glove.htmlhttp://www.webopedia.com/TERM/H/haptic.htmlhttp://www.webopedia.com/TERM/M/mouse.htmlhttp://www.webopedia.com/TERM/J/joystick.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/I/interface.html


3/19



Gesture Based Interaction

Fig 1. The system detects the hands and fingers in real-time.

Touch screens such as those found on the iPhone or iPad are the latest form of technologyallowing interaction with smart phones, computers and other devices. However, scientists at

Fraunhofer FIT has developed the next generation non-contact gesture and finger recognition

system. The novel system detects hand and finger positions in real-time and translates these

into appropriate interaction commands. Furthermore, the system does not require special

gloves or markers and is capable of supporting multiple users.

With touch screens becoming increasingly popular, classic interaction techniques such as a

mouse and keyboard are becoming less frequently used. One example of a breakthrough is

the Apple iPhone which was released in summer 2007. Since then many other devices

featuring touch screens and similar characteristics have been successfully launched -- with

more advanced devices even supporting multiple users simultaneously, e.g. the Microsoft

Surface table becoming available. This is an entire surface which can be used for input.

However, this form of interaction is specifically designed for two-dimensional surfaces.

Fraunhofer FIT has developed the next generation of multi-touch environment, one that

requires no physical contact and is entirely gesture-based.
http://images.sciencedaily.com/2010/07/100721085354-large.jpghttp://images.sciencedaily.com/2010/07/100721085354-large.jpghttp://images.sciencedaily.com/2010/07/100721085354-large.jpg


4/19



This system detects multiple fingers and hands at the same time and allows the user to

interact with objects on a display. The users move their hands and fingers in the air and the

system automatically recognizes and interprets the gestures accordingly.

Fig 2. The Data or Cyber Glove : A device capable of recording hand

movements, both the position of the hand and its orientation as well as finger movements; it

is capable of simple gesture recognition and general tracking of three-dimensional hand

orientation.


5/19



An input device for virtual reality in the form of a glove which measures the movements of

the wearer's fingers and transmits them to the computer. Sophisticated data gloves also

measure movement of the wrist and elbow. A data glove may also contain control buttons or

act as an output device, e.g. vibrating under control of the computer. The user usually sees a

virtual image of the data glove and can point or grip and push objects.

The CyberGlove is a fully instrumented glove that provides up to 22 high-accuracy joint-

angle measurements. It uses proprietary resistive bend-sensing technology to accurately

transform hand and finger motions into real-time digital joint-angle data. Our

VirtualHandStudio software converts the data into a graphical hand which mirrors the subtlemovements of the physical hand. It is available in two models and for either hand.

The 18-sensor model features two bend sensors on each finger, four abduction sensors, plus

sensors measuring thumb crossover , palm arch , wrist flexion and wrist abduction.

The 22-sensor model has three flexion sensors per finger, four abduction sensors, a palm-arch

sensor, and sensors to measure flexion and abduction. Each sensor is extremely thin and

flexible being virtually undetectable in the lightweight elastic glove.

The CyberGlove has been used in a wide variety of real-world applications, including digital

prototype evaluation, virtual reality biomechanics, and animation. The CyberGlove has

become the de facto standard for high-performance hand measurement and real-time motion

capture. Designed for Comfort and Functionality.

The CyberGlove has a software programmable switch and LED on the wristband to permit

the system software developer to provide the CyberGlove wearer with additional input/output

capability.

The instrumentation unit provides a variety of convenient functions and features including

time-stamp, CyberGlove status, external sampling synchronization and analog sensor outputs.
http://encyclopedia2.thefreedictionary.com/virtual+realityhttp://encyclopedia2.thefreedictionary.com/virtual+reality


6/19



Gesture types

In computer interfaces, two types of gestures are distinguished:

Offline gestures: Those gestures that are processedafter the user interaction with the object. An example is

the gesture to activate a menu.

Online gestures : Direct manipulation gestures.They are used to scale or rotate a tangible object.


7/19



Possible types of gestures

Gesture recognition is useful for processing information from humans which is not

conveyed through speech or type. As well, there are various types of gestures which can be

identified by computers.

Sign language recognition. Just as speech recognition can transcribe speech to

text, certain types of gesture recognition software can transcribe the symbols

represented through sign language into text.

For socially assistive robotics. By using proper sensors (accelerometers and

gyros) worn on the body of a patient and by reading the values from those

sensors, robots can assist in patient rehabilitation. The best example can be stroke

rehabilitation.

Directional indication through pointing. Pointing has a very specific purpose in

our society, to reference an object or location based on its position relative to

ourselves. The use of gesture recognition to determine where a person is pointingis useful for identifying the context of statements or instructions. This application

is of particular interest in the field of robotics.

Control through facial gestures. Controlling a computer through facial gestures

is a useful application of gesture recognition for users who may not physically

be able to use a mouse or keyboard. Eye tracking in particular may be of use for

controlling cursor motion or focusing on elements of a display.


8/19



Input devices

The ability to track a person's movements and determine what gestures they may be

performing can be achieved through various tools. Although there is a large amount of

research done in image/video based gesture recognition, there is some variation within the

tools and environments used between implementations.

Depth-aware cameras. Using specialized cameras such as time-of-flight cameras, one

can generate a depth map of what is being seen through the camera at a short range, and use

this data to approximate a 3d representation of what is

being seen. These can be effective for detection of hand gestures due to their short range

capabilities.

Stereo cameras. Using two cameras whose relations to one another are known, a

3d representation can be approximated by the output of the cameras. To get the

cameras' relations, one can use a positioning reference such as a lexian-stripe or

infrared emitters. In combination with direct motion measurement (6D-Vision)

gestures can directly be detected.

Controller-based gestures. These controllers act as an extension of the body so

that when gestures are performed, some of their motion can be conveniently

captured by software. Mouse gestures are one such example, where the motion of

the mouse is correlated to a symbol being drawn by a person's hand, as is the

Remote, which can study changes in acceleration over time to represent gestures.

Single camera. A normal camera can be used for gesture recognition where the

resources/environment would not be convenient for other forms of image-based

recognition. Although not necessarily as effective as stereo or depth aware

cameras, using a single camera allows a greater possibility of accessibility to a

wider audience.


9/19



Algorithms Used in Gesture Recognition

Depending on the type of the input data, the approach for interpreting a gesture could be done

in different ways. However, most of the techniques rely on key pointers represented in a 3D

coordinate system. Based on the relative motion of these, the gesture can be detected with a

high accuracy, depending of the quality of the input and the algorithms approach.

In order to interpret movements of the body, one has to classify them according to common

properties and the message the movements may express. For example, in sign language each

gesture represents a word or phrase.

3D model-based algorithms

Fig. 1 A read hand (left) is interpreted as a collection of vertices and lines in the 3Dmesh

version (right), and the software uses their relative positionand interaction in order to infer

the gesture .


10/19



The 3D model approach can use volumetric or skeletal models, or even a combination of the

two. Volumetric approaches have been heavily used in computer animation industry and for

computer vision purposes. The models are generally created of complicated 3D surfaces, like

NURBS or polygon meshes. The drawback of this method is that is very computational

intensive, and systems for live analysis are still to be developed. For the moment, a more

interesting approach would be to map simple primitive objects to the persons most important

body parts ( for example cylinders for the arms and neck, sphere for the head) and analyse the

way these interact with each other. Furthermore, some abstract structures like super-quadrics

and generalised cylinders may be even more suitable for approximating the body parts. Very

exciting about this approach is that the parameters for these objects are quite simple. In order

to better model the relation between these, we make use of constraints and hierarchies

between our objects.

Skeletal-based algorithms

Fig. 2 The skeletal version (right) is effectively modelling the hand (left). This has less

parameters than the volumetric version and it's easier to compute, making it suitable for real-

time gesture analysis systems
http://en.wikipedia.org/wiki/Superquadricshttp://en.wikipedia.org/wiki/Cylinder_(geometry)http://www.gesturetek.com/illuminate/productsolutions_illuminatedisplay.phphttp://en.wikipedia.org/wiki/Cylinder_(geometry)http://en.wikipedia.org/wiki/Superquadrics


11/19



Instead of using intensive processing of the 3D models and dealing with a lot of parameters,

one can just use a simplified version of joint angle parameters along with segment lengths.

This is known as a skeletal representation of the body, where a virtual skeleton of the person

is computed and parts of the body are mapped to certain segments. The analysis here is done

using the position and orientation of these segments and the relation between each one of

them( for example the angle between the joints and the relative position or orientation).

Appearance-based models

Fig 3. These binary silhouette (left) or contour(right) images represent typical input for

appearance-based algorithms. They are compared with different hand templates and if they

match, the correspondent gesture is inferred


12/19



These models dont use a spatial representation of the body anymore, because they derive the

parameters directly from the images or videos using a template database. Some are based on

the deformable 2D templates of the human parts of the body, particularly hands. Deformable

templates are sets of points on the outline of an object, used as interpolation nodes for the

objects outline approximation. One of the simplest interpolation function is linear, which

performs an average shape from point sets , point variability parameters and external

deformators. These template-based models are mostly used for hand-tracking , but could also

be of use for simple gesture classification.

A second approach in gesture detecting using appearance-based models uses image

sequences as gesture templates. Parameters for this method are either the images themselves,

or certain features derived from these. Most of the time, only one ( monoscopic) or two

( stereoscopic ) views are used.


13/19



Challenges

There are many challenges associated with the accuracy and usefulness of gesturerecognition software. For image-based gesture recognition there are limitations on the

equipment used and image noise. Images or video may not be under consistent lighting, or in

the same location. Items in the background or distinct features of the users may make

recognition more difficult.

The variety of implementations for image-based gesture recognition may also cause issue for

viability of the technology to general usage. For example, an algorithm calibrated for one

camera may not work for a different camera. The amount of background noise also causes

tracking and recognition difficulties, especially when occlusions (partial and full) occur.

Furthermore, the distance from the camera, and the camera's resolution and quality, also

cause variations in recognition accuracy.

In order to capture human gestures by visual sensors, robust computer vision methods are also

required, for example for hand tracking and hand posture recognition or for capturing

movements of the head, facial expressions or gaze direction.


14/19



Upcoming New Technologies

The Sixth Sense De vice:-

The Sixth Sense prototype is comprised of a pocket projector, a mirror and a camera. The

hardware components are coupled in a pendant like mobile wearable device. Both the

projector and the camera are connected to the mobile computing device in the users pocket.

The projector projects visual information enabling surfaces, walls and physical objects

around us to be used as interfaces; while the camera recognizes and tracks user's hand

gestures and physical objects using computer-vision based techniques. The software

program processes the video stream data captured by the camera and tracks the locations of

the colored markers (visual tracking fiducially) at the tip of the users fingers using

simple computer-vision techniques. The movements and arrangements of these

fiducially are interpreted into gestures that act as interaction instructions for the projected

application interfaces. The maximum number of tracked fingers is only constrained by the

number of unique fiducials , thus SixthSense also supports multi-userinteraction. The

SixthSense prototype implements several applications that demonstrate the usefulness,viability and flexibility of the system. The map application lets the user navigate a map

displayed on a nearby surface using hand gestures, similar to gestures supported by Multi-

Touch based systems, letting the user zoom in, zoom out or pan using intuitive hand

movements. The drawing application lets the user draw on any surface by tracking the

fingertip movements of the users index finger. Sixth Sense also recognizes users

freehandgestures(postures).


15/19



Construction and Working

Fig 1. The sixth sense system

The SixthSense prototype comprises a pocket projector, a mirror and a camera

contained in a pendant like, wearable device. Both the projector and the

camera are connected to a mobile computing device in the users pocket. The

projector projects visual information enabling surfaces, walls and physical

objects around us to be used as interfaces; while the camera recognizes and

tracks user's hand gestures and physical objects using computer-vision based

techniques. The software program processes the video stream data captured

by the camera and tracks the locations of the colored markers (visual tracking

fiducials) at the tips of the users fingers. The movements and arrangements of

these fiducials are interpreted into gestures that act as interaction instructions

for the projected application interfaces. SixthSense supports multi-touch and multi-user

interaction.


16/19



Fig 2. The procedure carried in sixth sense

The hardware that makes Sixth Sense work is a pendant like mobile wearableinterface

It has a camera, a mirror and a projector and is connected wirelessly to a Bluetooth or3G or wifi smart phone that can slip comfortably into ones pocket

The camera recognizes individuals, images, pictures, gestures one makes with theirhands

Information is sent to the Smartphone for processing The downward-facing projector projects the output image on to the mirror Mirror reflects image on to the desired surface Thus, digital information is freed from its confines and placed in the physical world


17/19



Example Applications

The SixthSense prototype contains a number of demonstration applications.

using

hand gestures to zoom and pan

fingertip

movements of the users index finger.

objects

the user interacts with.

The system recognizes a user's freehand gestures as well as icons/symbols drawn in the air

with the index finger, for example:

or wall

and flick through the photos he/she has taken.

an @

symbol lets the user check his mail.


18/19



Conclusion

The goal of virtual environments (VE) is to provide natural, efficient, powerful,

and flexible interaction. Gesture as an input modality can help meet these

requirements because Human gestures are natural and flexible, and may be

efficient and powerful, especially as compared with alternative interaction modes.

The traditional two-dimensional (2D), keyboard and mouse-oriented graphical user

interface (GUI) is not well suited for virtual environments. Synthetic

environments provide the opportunity to utilize several different sensing

modalities and technologies and tointegrate them into the user experience.

Devices which sense body position and orientation, direction of gaze, speech and

sound, facial expression, galvanic skin response, and other aspects of human

behaviour or state can be used to mediate communication between the human

and the environment. Combinations of communication modalities and sensing

devices can produce a wide range of unimodal and multimodal interface

techniques. The potential for these techniques to support natural and

powerful interfaces for communication in VEs appears promising.


19/19



References

1. Matthias Rehm, Nikolaus Bee, Elisabeth Andr, Wave Like an Egyptian -Accelerometer Based Gesture Recognition for Culture Specific Interactions, British

Computer Society, 2007

2. Pavlovic, V., Sharma, R. & Huang, T. (1997), "Visual interpretation of hand gesturesfor human-computer interaction: A review", IEEE Trans. Pattern Analysis and

Machine Intelligence., July, 1997. Vol. 19(7), pp. 677 -695.

3. R. Cipolla and A. Pentland, Computer Vision for Human-Machine Interaction,Cambridge University Press, 1998, ISBN 978-0521622530

4. Ying Wu and Thomas S. Huang, "Vision-Based Gesture Recognition: A Review", In:Gesture-Based Communication in Human-Computer Interaction, Volume 1739 of

Springer Lecture Notes in Computer Science, pages 103-115, 1999, ISBN 978-3-540-

66935-7, doi 10.1007/3-540-46616-9

5. Alejandro Jaimesa and Nicu Sebe, Multimodal humancomputer interaction: Asurvey, Computer Vision and Image Understanding Volume 108, Issues 1-2,

OctoberNovember 2007, Pages 116-134 Special Issue on Vision for Human-Computer Interaction, doi:10.1016/j.cviu.2006.10.019
http://mm-werkstatt.informatik.uni-augsburg.de/files/publications/199/wave_like_an_egyptian_final.pdfhttp://mm-werkstatt.informatik.uni-augsburg.de/files/publications/199/wave_like_an_egyptian_final.pdfhttp://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdfhttp://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdfhttp://books.google.com/books?id=Pe7gG0LxEUIC&dq=pentland+cipolla+computer+vision+human+interaction&printsec=frontcover&source=bl&ots=O2q5ExL8PU&sig=FMhom_f4h9dqeib-6pSSpjbsB38&hl=en&ei=uzvsSbruBdqIsAaq5PCKBw&sa=X&oi=book_result&ct=result&resnum=1http://en.wikipedia.org/wiki/Special:BookSources/9780521622530http://reference.kfupm.edu.sa/content/v/i/vision_based_gesture_recognition__a_revi_291732.pdfhttp://en.wikipedia.org/wiki/Special:BookSources/9783540669357http://en.wikipedia.org/wiki/Special:BookSources/9783540669357http://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://en.wikipedia.org/wiki/Special:BookSources/9783540669357http://en.wikipedia.org/wiki/Special:BookSources/9783540669357http://reference.kfupm.edu.sa/content/v/i/vision_based_gesture_recognition__a_revi_291732.pdfhttp://en.wikipedia.org/wiki/Special:BookSources/9780521622530http://books.google.com/books?id=Pe7gG0LxEUIC&dq=pentland+cipolla+computer+vision+human+interaction&printsec=frontcover&source=bl&ots=O2q5ExL8PU&sig=FMhom_f4h9dqeib-6pSSpjbsB38&hl=en&ei=uzvsSbruBdqIsAaq5PCKBw&sa=X&oi=book_result&ct=result&resnum=1http://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdfhttp://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdfhttp://mm-werkstatt.informatik.uni-augsburg.de/files/publications/199/wave_like_an_egyptian_final.pdfhttp://mm-werkstatt.informatik.uni-augsburg.de/files/publications/199/wave_like_an_egyptian_final.pdf

The Sixth Sense Final

Documents

Transcript of The Sixth Sense Final