The Sixth Sense Final

download The Sixth Sense Final

of 19

Transcript of The Sixth Sense Final

  • 8/2/2019 The Sixth Sense Final

    1/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 1

    Introduction

    Gesture recognition is a topic in computer science and language technology with the

    goal of interpreting human gestures via mathematical algorithms. Gestures can originate

    from any bodily motion or state but commonly originate from the face or hand. Current

    focuses in the field include emotion recognition from the face and hand gesture

    recognition. Many approaches have been made using cameras and computer vision

    algorithms to interpret sign language. However, the identification and recognition of

    posture, gait, and human behaviours is also the subject of gesture recognition

    techniques.Gesture recognition can be seen as a way for computers to begin to understand human body

    language, thus building a richer bridge between machines and humans than primitive

    text user interfaces or even GUIs (graphical user interfaces), which still limit the majority of

    input to keyboard and mouse.

    Gesture recognition enables humans to interface with the machine (HMI) and interact

    naturally without any mechanical devices. Using the concept of gesture recognition, it is

    possible to point a finger at the computer screen so that the cursor will move accordingly. This

    could potentially make conventional input devices such as mouse, keyboards and even touch-

    screens redundant.

    Gesture recognition can be conducted with techniques from computer vision and image

    processing.

  • 8/2/2019 The Sixth Sense Final

    2/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 2

    Interface with computers using gestures of the human body, typically hand movements.

    In gesture recognition technology, a camera reads the movements of the human body and

    communicates the data to a computer that uses the gestures as input to control devicesor applications. For example, a person clapping his hands together in front of a camera

    can produce the sound of cymbals being crashed together when the gesture is fed through

    a computer.

    One way gesture recognition is being used is to help the physically impaired to interact

    with computers, such as interpreting sign language. The technology also has the potential

    to change the way users interact with computers by eliminating input devices such

    as joysticks, mice and keyboards and allowing the unencumbered body to give signals to

    the computer through gestures such as finger pointing.

    Unlike hap tic interfaces, gesture recognition does not require the user to wear any special

    equipment or attach any devices to the body. The gestures of the body are read by a

    camera instead of sensors attached to a device such as a data glove.

    In addition to hand and body movement, gesture recognition technology also can be used to read facial

    and speech expressions (i.e., lip reading), and eye movements. The literature includes ongoing work in

    the computer vision field on capturing gestures or more general human pose and movements by

    cameras connected to a computer.

    http://www.webopedia.com/TERM/I/interface.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/J/joystick.htmlhttp://www.webopedia.com/TERM/M/mouse.htmlhttp://www.webopedia.com/TERM/H/haptic.htmlhttp://www.webopedia.com/TERM/D/data_glove.htmlhttp://www.webopedia.com/TERM/D/data_glove.htmlhttp://www.webopedia.com/TERM/H/haptic.htmlhttp://www.webopedia.com/TERM/M/mouse.htmlhttp://www.webopedia.com/TERM/J/joystick.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/D/device.htmlhttp://www.webopedia.com/TERM/I/interface.html
  • 8/2/2019 The Sixth Sense Final

    3/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 3

    Gesture Based Interaction

    Fig 1. The system detects the hands and fingers in real-time.

    Touch screens such as those found on the iPhone or iPad are the latest form of technologyallowing interaction with smart phones, computers and other devices. However, scientists at

    Fraunhofer FIT has developed the next generation non-contact gesture and finger recognition

    system. The novel system detects hand and finger positions in real-time and translates these

    into appropriate interaction commands. Furthermore, the system does not require special

    gloves or markers and is capable of supporting multiple users.

    With touch screens becoming increasingly popular, classic interaction techniques such as a

    mouse and keyboard are becoming less frequently used. One example of a breakthrough is

    the Apple iPhone which was released in summer 2007. Since then many other devices

    featuring touch screens and similar characteristics have been successfully launched -- with

    more advanced devices even supporting multiple users simultaneously, e.g. the Microsoft

    Surface table becoming available. This is an entire surface which can be used for input.

    However, this form of interaction is specifically designed for two-dimensional surfaces.

    Fraunhofer FIT has developed the next generation of multi-touch environment, one that

    requires no physical contact and is entirely gesture-based.

    http://images.sciencedaily.com/2010/07/100721085354-large.jpghttp://images.sciencedaily.com/2010/07/100721085354-large.jpghttp://images.sciencedaily.com/2010/07/100721085354-large.jpg
  • 8/2/2019 The Sixth Sense Final

    4/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 4

    This system detects multiple fingers and hands at the same time and allows the user to

    interact with objects on a display. The users move their hands and fingers in the air and the

    system automatically recognizes and interprets the gestures accordingly.

    Fig 2. The Data or Cyber Glove : A device capable of recording hand

    movements, both the position of the hand and its orientation as well as finger movements; it

    is capable of simple gesture recognition and general tracking of three-dimensional hand

    orientation.

  • 8/2/2019 The Sixth Sense Final

    5/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 5

    An input device for virtual reality in the form of a glove which measures the movements of

    the wearer's fingers and transmits them to the computer. Sophisticated data gloves also

    measure movement of the wrist and elbow. A data glove may also contain control buttons or

    act as an output device, e.g. vibrating under control of the computer. The user usually sees a

    virtual image of the data glove and can point or grip and push objects.

    The CyberGlove is a fully instrumented glove that provides up to 22 high-accuracy joint-

    angle measurements. It uses proprietary resistive bend-sensing technology to accurately

    transform hand and finger motions into real-time digital joint-angle data. Our

    VirtualHandStudio software converts the data into a graphical hand which mirrors the subtlemovements of the physical hand. It is available in two models and for either hand.

    The 18-sensor model features two bend sensors on each finger, four abduction sensors, plus

    sensors measuring thumb crossover , palm arch , wrist flexion and wrist abduction.

    The 22-sensor model has three flexion sensors per finger, four abduction sensors, a palm-arch

    sensor, and sensors to measure flexion and abduction. Each sensor is extremely thin and

    flexible being virtually undetectable in the lightweight elastic glove.

    The CyberGlove has been used in a wide variety of real-world applications, including digital

    prototype evaluation, virtual reality biomechanics, and animation. The CyberGlove has

    become the de facto standard for high-performance hand measurement and real-time motion

    capture. Designed for Comfort and Functionality.

    The CyberGlove has a software programmable switch and LED on the wristband to permit

    the system software developer to provide the CyberGlove wearer with additional input/output

    capability.

    The instrumentation unit provides a variety of convenient functions and features including

    time-stamp, CyberGlove status, external sampling synchronization and analog sensor outputs.

    http://encyclopedia2.thefreedictionary.com/virtual+realityhttp://encyclopedia2.thefreedictionary.com/virtual+reality
  • 8/2/2019 The Sixth Sense Final

    6/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 6

    Gesture types

    In computer interfaces, two types of gestures are distinguished:

    Offline gestures: Those gestures that are processedafter the user interaction with the object. An example is

    the gesture to activate a menu.

    Online gestures : Direct manipulation gestures.They are used to scale or rotate a tangible object.

  • 8/2/2019 The Sixth Sense Final

    7/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 7

    Possible types of gestures

    Gesture recognition is useful for processing information from humans which is not

    conveyed through speech or type. As well, there are various types of gestures which can be

    identified by computers.

    Sign language recognition. Just as speech recognition can transcribe speech to

    text, certain types of gesture recognition software can transcribe the symbols

    represented through sign language into text.

    For socially assistive robotics. By using proper sensors (accelerometers and

    gyros) worn on the body of a patient and by reading the values from those

    sensors, robots can assist in patient rehabilitation. The best example can be stroke

    rehabilitation.

    Directional indication through pointing. Pointing has a very specific purpose in

    our society, to reference an object or location based on its position relative to

    ourselves. The use of gesture recognition to determine where a person is pointingis useful for identifying the context of statements or instructions. This application

    is of particular interest in the field of robotics.

    Control through facial gestures. Controlling a computer through facial gestures

    is a useful application of gesture recognition for users who may not physically

    be able to use a mouse or keyboard. Eye tracking in particular may be of use for

    controlling cursor motion or focusing on elements of a display.

  • 8/2/2019 The Sixth Sense Final

    8/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 8

    Input devices

    The ability to track a person's movements and determine what gestures they may be

    performing can be achieved through various tools. Although there is a large amount of

    research done in image/video based gesture recognition, there is some variation within the

    tools and environments used between implementations.

    Depth-aware cameras. Using specialized cameras such as time-of-flight cameras, one

    can generate a depth map of what is being seen through the camera at a short range, and use

    this data to approximate a 3d representation of what is

    being seen. These can be effective for detection of hand gestures due to their short range

    capabilities.

    Stereo cameras. Using two cameras whose relations to one another are known, a

    3d representation can be approximated by the output of the cameras. To get the

    cameras' relations, one can use a positioning reference such as a lexian-stripe or

    infrared emitters. In combination with direct motion measurement (6D-Vision)

    gestures can directly be detected.

    Controller-based gestures. These controllers act as an extension of the body so

    that when gestures are performed, some of their motion can be conveniently

    captured by software. Mouse gestures are one such example, where the motion of

    the mouse is correlated to a symbol being drawn by a person's hand, as is the

    Remote, which can study changes in acceleration over time to represent gestures.

    Single camera. A normal camera can be used for gesture recognition where the

    resources/environment would not be convenient for other forms of image-based

    recognition. Although not necessarily as effective as stereo or depth aware

    cameras, using a single camera allows a greater possibility of accessibility to a

    wider audience.

  • 8/2/2019 The Sixth Sense Final

    9/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 9

    Algorithms Used in Gesture Recognition

    Depending on the type of the input data, the approach for interpreting a gesture could be done

    in different ways. However, most of the techniques rely on key pointers represented in a 3D

    coordinate system. Based on the relative motion of these, the gesture can be detected with a

    high accuracy, depending of the quality of the input and the algorithms approach.

    In order to interpret movements of the body, one has to classify them according to common

    properties and the message the movements may express. For example, in sign language each

    gesture represents a word or phrase.

    3D model-based algorithms

    Fig. 1 A read hand (left) is interpreted as a collection of vertices and lines in the 3Dmesh

    version (right), and the software uses their relative positionand interaction in order to infer

    the gesture .

  • 8/2/2019 The Sixth Sense Final

    10/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 10

    The 3D model approach can use volumetric or skeletal models, or even a combination of the

    two. Volumetric approaches have been heavily used in computer animation industry and for

    computer vision purposes. The models are generally created of complicated 3D surfaces, like

    NURBS or polygon meshes. The drawback of this method is that is very computational

    intensive, and systems for live analysis are still to be developed. For the moment, a more

    interesting approach would be to map simple primitive objects to the persons most important

    body parts ( for example cylinders for the arms and neck, sphere for the head) and analyse the

    way these interact with each other. Furthermore, some abstract structures like super-quadrics

    and generalised cylinders may be even more suitable for approximating the body parts. Very

    exciting about this approach is that the parameters for these objects are quite simple. In order

    to better model the relation between these, we make use of constraints and hierarchies

    between our objects.

    Skeletal-based algorithms

    Fig. 2 The skeletal version (right) is effectively modelling the hand (left). This has less

    parameters than the volumetric version and it's easier to compute, making it suitable for real-

    time gesture analysis systems

    http://en.wikipedia.org/wiki/Superquadricshttp://en.wikipedia.org/wiki/Cylinder_(geometry)http://www.gesturetek.com/illuminate/productsolutions_illuminatedisplay.phphttp://en.wikipedia.org/wiki/Cylinder_(geometry)http://en.wikipedia.org/wiki/Superquadrics
  • 8/2/2019 The Sixth Sense Final

    11/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 11

    Instead of using intensive processing of the 3D models and dealing with a lot of parameters,

    one can just use a simplified version of joint angle parameters along with segment lengths.

    This is known as a skeletal representation of the body, where a virtual skeleton of the person

    is computed and parts of the body are mapped to certain segments. The analysis here is done

    using the position and orientation of these segments and the relation between each one of

    them( for example the angle between the joints and the relative position or orientation).

    Appearance-based models

    Fig 3. These binary silhouette (left) or contour(right) images represent typical input for

    appearance-based algorithms. They are compared with different hand templates and if they

    match, the correspondent gesture is inferred

  • 8/2/2019 The Sixth Sense Final

    12/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 12

    These models dont use a spatial representation of the body anymore, because they derive the

    parameters directly from the images or videos using a template database. Some are based on

    the deformable 2D templates of the human parts of the body, particularly hands. Deformable

    templates are sets of points on the outline of an object, used as interpolation nodes for the

    objects outline approximation. One of the simplest interpolation function is linear, which

    performs an average shape from point sets , point variability parameters and external

    deformators. These template-based models are mostly used for hand-tracking , but could also

    be of use for simple gesture classification.

    A second approach in gesture detecting using appearance-based models uses image

    sequences as gesture templates. Parameters for this method are either the images themselves,

    or certain features derived from these. Most of the time, only one ( monoscopic) or two

    ( stereoscopic ) views are used.

  • 8/2/2019 The Sixth Sense Final

    13/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 13

    Challenges

    There are many challenges associated with the accuracy and usefulness of gesturerecognition software. For image-based gesture recognition there are limitations on the

    equipment used and image noise. Images or video may not be under consistent lighting, or in

    the same location. Items in the background or distinct features of the users may make

    recognition more difficult.

    The variety of implementations for image-based gesture recognition may also cause issue for

    viability of the technology to general usage. For example, an algorithm calibrated for one

    camera may not work for a different camera. The amount of background noise also causes

    tracking and recognition difficulties, especially when occlusions (partial and full) occur.

    Furthermore, the distance from the camera, and the camera's resolution and quality, also

    cause variations in recognition accuracy.

    In order to capture human gestures by visual sensors, robust computer vision methods are also

    required, for example for hand tracking and hand posture recognition or for capturing

    movements of the head, facial expressions or gaze direction.

  • 8/2/2019 The Sixth Sense Final

    14/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 14

    Upcoming New Technologies

    The Sixth Sense De vice:-

    The Sixth Sense prototype is comprised of a pocket projector, a mirror and a camera. The

    hardware components are coupled in a pendant like mobile wearable device. Both the

    projector and the camera are connected to the mobile computing device in the users pocket.

    The projector projects visual information enabling surfaces, walls and physical objects

    around us to be used as interfaces; while the camera recognizes and tracks user's hand

    gestures and physical objects using computer-vision based techniques. The software

    program processes the video stream data captured by the camera and tracks the locations of

    the colored markers (visual tracking fiducially) at the tip of the users fingers using

    simple computer-vision techniques. The movements and arrangements of these

    fiducially are interpreted into gestures that act as interaction instructions for the projected

    application interfaces. The maximum number of tracked fingers is only constrained by the

    number of unique fiducials , thus SixthSense also supports multi-userinteraction. The

    SixthSense prototype implements several applications that demonstrate the usefulness,viability and flexibility of the system. The map application lets the user navigate a map

    displayed on a nearby surface using hand gestures, similar to gestures supported by Multi-

    Touch based systems, letting the user zoom in, zoom out or pan using intuitive hand

    movements. The drawing application lets the user draw on any surface by tracking the

    fingertip movements of the users index finger. Sixth Sense also recognizes users

    freehandgestures(postures).

  • 8/2/2019 The Sixth Sense Final

    15/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 15

    Construction and Working

    Fig 1. The sixth sense system

    The SixthSense prototype comprises a pocket projector, a mirror and a camera

    contained in a pendant like, wearable device. Both the projector and the

    camera are connected to a mobile computing device in the users pocket. The

    projector projects visual information enabling surfaces, walls and physical

    objects around us to be used as interfaces; while the camera recognizes and

    tracks user's hand gestures and physical objects using computer-vision based

    techniques. The software program processes the video stream data captured

    by the camera and tracks the locations of the colored markers (visual tracking

    fiducials) at the tips of the users fingers. The movements and arrangements of

    these fiducials are interpreted into gestures that act as interaction instructions

    for the projected application interfaces. SixthSense supports multi-touch and multi-user

    interaction.

  • 8/2/2019 The Sixth Sense Final

    16/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 16

    Fig 2. The procedure carried in sixth sense

    The hardware that makes Sixth Sense work is a pendant like mobile wearableinterface

    It has a camera, a mirror and a projector and is connected wirelessly to a Bluetooth or3G or wifi smart phone that can slip comfortably into ones pocket

    The camera recognizes individuals, images, pictures, gestures one makes with theirhands

    Information is sent to the Smartphone for processing The downward-facing projector projects the output image on to the mirror Mirror reflects image on to the desired surface Thus, digital information is freed from its confines and placed in the physical world

  • 8/2/2019 The Sixth Sense Final

    17/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 17

    Example Applications

    The SixthSense prototype contains a number of demonstration applications.

    using

    hand gestures to zoom and pan

    fingertip

    movements of the users index finger.

    objects

    the user interacts with.

    The system recognizes a user's freehand gestures as well as icons/symbols drawn in the air

    with the index finger, for example:

    or wall

    and flick through the photos he/she has taken.

    an @

    symbol lets the user check his mail.

  • 8/2/2019 The Sixth Sense Final

    18/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 18

    Conclusion

    The goal of virtual environments (VE) is to provide natural, efficient, powerful,

    and flexible interaction. Gesture as an input modality can help meet these

    requirements because Human gestures are natural and flexible, and may be

    efficient and powerful, especially as compared with alternative interaction modes.

    The traditional two-dimensional (2D), keyboard and mouse-oriented graphical user

    interface (GUI) is not well suited for virtual environments. Synthetic

    environments provide the opportunity to utilize several different sensing

    modalities and technologies and tointegrate them into the user experience.

    Devices which sense body position and orientation, direction of gaze, speech and

    sound, facial expression, galvanic skin response, and other aspects of human

    behaviour or state can be used to mediate communication between the human

    and the environment. Combinations of communication modalities and sensing

    devices can produce a wide range of unimodal and multimodal interface

    techniques. The potential for these techniques to support natural and

    powerful interfaces for communication in VEs appears promising.

  • 8/2/2019 The Sixth Sense Final

    19/19

    The Sixth Sense: Gesture Recognition

    Department of CS&E,PESCE, Mandya Page 19

    References

    1. Matthias Rehm, Nikolaus Bee, Elisabeth Andr, Wave Like an Egyptian -Accelerometer Based Gesture Recognition for Culture Specific Interactions, British

    Computer Society, 2007

    2. Pavlovic, V., Sharma, R. & Huang, T. (1997), "Visual interpretation of hand gesturesfor human-computer interaction: A review", IEEE Trans. Pattern Analysis and

    Machine Intelligence., July, 1997. Vol. 19(7), pp. 677 -695.

    3. R. Cipolla and A. Pentland, Computer Vision for Human-Machine Interaction,Cambridge University Press, 1998, ISBN 978-0521622530

    4. Ying Wu and Thomas S. Huang, "Vision-Based Gesture Recognition: A Review", In:Gesture-Based Communication in Human-Computer Interaction, Volume 1739 of

    Springer Lecture Notes in Computer Science, pages 103-115, 1999, ISBN 978-3-540-

    66935-7, doi 10.1007/3-540-46616-9

    5. Alejandro Jaimesa and Nicu Sebe, Multimodal humancomputer interaction: Asurvey, Computer Vision and Image Understanding Volume 108, Issues 1-2,

    OctoberNovember 2007, Pages 116-134 Special Issue on Vision for Human-Computer Interaction, doi:10.1016/j.cviu.2006.10.019

    http://mm-werkstatt.informatik.uni-augsburg.de/files/publications/199/wave_like_an_egyptian_final.pdfhttp://mm-werkstatt.informatik.uni-augsburg.de/files/publications/199/wave_like_an_egyptian_final.pdfhttp://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdfhttp://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdfhttp://books.google.com/books?id=Pe7gG0LxEUIC&dq=pentland+cipolla+computer+vision+human+interaction&printsec=frontcover&source=bl&ots=O2q5ExL8PU&sig=FMhom_f4h9dqeib-6pSSpjbsB38&hl=en&ei=uzvsSbruBdqIsAaq5PCKBw&sa=X&oi=book_result&ct=result&resnum=1http://en.wikipedia.org/wiki/Special:BookSources/9780521622530http://reference.kfupm.edu.sa/content/v/i/vision_based_gesture_recognition__a_revi_291732.pdfhttp://en.wikipedia.org/wiki/Special:BookSources/9783540669357http://en.wikipedia.org/wiki/Special:BookSources/9783540669357http://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://staff.science.uva.nl/~nicu/PUBS/PDF/2005/sebeHCI05.pdfhttp://en.wikipedia.org/wiki/Special:BookSources/9783540669357http://en.wikipedia.org/wiki/Special:BookSources/9783540669357http://reference.kfupm.edu.sa/content/v/i/vision_based_gesture_recognition__a_revi_291732.pdfhttp://en.wikipedia.org/wiki/Special:BookSources/9780521622530http://books.google.com/books?id=Pe7gG0LxEUIC&dq=pentland+cipolla+computer+vision+human+interaction&printsec=frontcover&source=bl&ots=O2q5ExL8PU&sig=FMhom_f4h9dqeib-6pSSpjbsB38&hl=en&ei=uzvsSbruBdqIsAaq5PCKBw&sa=X&oi=book_result&ct=result&resnum=1http://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdfhttp://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdfhttp://mm-werkstatt.informatik.uni-augsburg.de/files/publications/199/wave_like_an_egyptian_final.pdfhttp://mm-werkstatt.informatik.uni-augsburg.de/files/publications/199/wave_like_an_egyptian_final.pdf