GazeLib - City Ulbms03.cityu.edu.hk/studproj/cs/2010csnky597.pdf · Supervisor : Prof IP, Ho Shing...

81
(09CS026) GazeLib A low Cost Implementation of Real-time Gaze Tracking Framework (Volume 1 of 1 ) Student Name : Ng King Yui Student No. : Programme Code : BScCS Supervisor : Prof IP, Ho Shing Horace Date : 12 April, 2010 City University of Hong Kong Department of Computer Science BSCCS/BSCS Final Year Project 2009-2010 Final Report For Official Use Only

Transcript of GazeLib - City Ulbms03.cityu.edu.hk/studproj/cs/2010csnky597.pdf · Supervisor : Prof IP, Ho Shing...

  • (09CS026)

    GazeLib A low Cost Implementation of Real-time Gaze Tracking

    Framework

    (Volume 1 of 1 )

    Student Name : Ng King Yui

    Student No. :

    Programme Code

    : BScCS

    Supervisor : Prof IP, Ho Shing Horace

    Date : 12 April, 2010

    City University of Hong Kong Department of Computer Science BSCCS/BSCS Final Year Project 2009-2010

    Final Report

    For Official Use Only

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 2 of 81-

    Declaration

    I have read the project guidelines and I understand the meaning of academic

    dishonesty, in particular plagiarism and collusion. I hereby declare that the work I

    submitted for my final year project, entitled:

    GazeLib: A low Cost Implementation of Real-time Gaze Tracking Framework

    does not involve academic dishonesty. I give permission for my final year project work

    to be electronically scanned and if found to involve academic dishonesty, I am aware of

    the consequences as stated in the Project Guidelines.

    Student Name: Ng King Yui Signature: _____________

    Student ID : Date : _____________

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 3 of 81-

    Abstract

    Eye gaze reflects a person’s attention over time, which is a powerful cue for

    determining what might be interesting. Therefore, eyes have not only indispensable

    meaning for human communication, but also great potential to build human computer

    interaction in a more natural and direct mode. Ironically, most available gaze trackers

    are either driven by specific design operating software or high end hardware. High

    costs have always been the barrier prohibiting wide spread of gaze tracking technology.

    Moreover, the majority gaze trackers are adopted corneal reflection tracking technique

    which actively illuminates the eye region by infrared light requiring quasi-stable lighting

    conditions to operate. In addition, potential eye hazards maybe arise from long period

    or close proximity IR exposure. To solve these problems, a robust hybrid method

    integrates model and feature based tracking approaches is proposed in this project.

    The core of the proposed method is applying Active Shape Model fitting technique to

    locate facial features. Eye features are then extracted by sophisticated image

    processing including filtering and eclipse fitting. Towards a framework available to

    public, the proposed framework is packed in programming library and available in an

    open-source package.Evaluation experiments show that system prototype is capable to

    perform real-time remote gaze tracking under several lighting conditions with low-cost

    and off-the-shelf webcams while maintaining acceptable accuracy.The proposed

    method has significantly improved the usability, reduced the cost of using gaze tracking

    technology, which is an important step to make it enter the mass market.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 4 of 81-

    Acknowledgments

    I would like to thank my supervisor Prof. Horace H.S. IP for his advice and valuable

    support throughout the development of this project. This project really would not have

    been reached to an end without his guidance and patience.

    I would also like to thank Dr. Ken C.K. LAW, Dr. Joe C.H. YUEN from Department of

    Computer Science and Dr. Lionel P.K. SUN from Department of Mathematics for their

    gentlemanly supports and guidance.

    Furthermore, also great thanks to the organizations, including FG-NET consortium and

    DTU IMM that prepared and published the annotated face databases used in the

    project, and to those who allowed their faces to be used.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 5 of 81-

    A low Cost Implementation of Real-time Gaze Tracking Framework

    Final Report

    K.Y. Ng

    Deliverable date: 12 April, 2009

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 6 of 81-

    Table of Contents Introduction .................................................................................................................... 9

    Introduction ............................................................................................................... 10

    Literature Review ......................................................................................................... 13

    2.1 Gaze Tracking ..................................................................................................... 14

    2.1.1 Biological structure of human eye ................................................................ 14

    2.1.2 Mathematical eye model .............................................................................. 17

    2.2 Eye Tracking Techniques.................................................................................... 19

    2.2.1 Electro-Oculography (EOG) ......................................................................... 19

    2.2.2 Scleral Contact lens/ Search Coils ............................................................... 20

    2.2.3 Video-Oculography with Corneal Reflection ................................................. 20

    2.2.4 Video-Oculography with visual light ............................................................. 23

    2.3 Video-based gaze tracking hardware settings .................................................... 24

    2.3.1 Head-mount ................................................................................................. 24

    2.3.2 Table-mount ................................................................................................. 24

    2.4 Potential hazards with IR .................................................................................... 25

    Methodology ................................................................................................................. 26

    3.1 Introduction ......................................................................................................... 27

    3.2 Active Shape model (ASM) ................................................................................. 27

    3.3 Active appearance model (AAM) ........................................................................ 28

    3.4 POSIT ................................................................................................................. 28

    3.5 RANSAC ............................................................................................................. 29

    Design & Implementation ............................................................................................. 30

    4.1 Introduction ......................................................................................................... 31

    4.2 Development Environment .................................................................................. 31

    4.3 System design .................................................................................................... 32

    4.3.1 Architecture .................................................................................................. 32

    4.3.2 Conceptual Class diagram ........................................................................... 33

    4.4 System implementation ....................................................................................... 34

    4.4.1 Overall system flow ...................................................................................... 34

    4.4.2 Active Shape Model building ........................................................................ 35

    4.4.3 Face detection and tracking ......................................................................... 37

    4.4.4 Head poses estimation ................................................................................. 41

    4.4.5 Eye feature extraction .................................................................................. 44

    4.4.6 Gaze Estimation ........................................................................................... 46

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 7 of 81-

    Results & Comments .................................................................................................... 49

    5.1 Introduction ......................................................................................................... 50

    5.2 Testing environment............................................................................................ 50

    5.3 Performance of face tracking .............................................................................. 51

    5.3.1 Presence of distractions ............................................................................... 51

    5.3.2 Various ambient lighting conditions .............................................................. 54

    5.3.3 Different facial expressions .......................................................................... 55

    5.3.4 Blurred inputted image ................................................................................. 56

    5.4 Performance of eye features extraction .............................................................. 56

    5.4.1 Presence of distractions ............................................................................... 56

    5.4.2 Various ambient lighting conditions .............................................................. 60

    5.4.3 Various iris colors ......................................................................................... 61

    5.5 Performance of head poses estimation ............................................................... 62

    5.5.1 Pose estimation results using POSIT ........................................................... 62

    5.5.2 Pose estimation results using LK Optical flow .............................................. 63

    5.5.3 Compare of using two approaches ............................................................... 63

    5.6 Performance of whole system ............................................................................. 65

    5.6.1 Speed ........................................................................................................... 65

    5.6.2 Tracking with off-the-shelf equipment .......................................................... 66

    5.6.3 Gaze tracking accuracy ................................................................................ 66

    5.6 Case study: GazePad ......................................................................................... 67

    5.6.1 Motivations ................................................................................................... 68

    5.6.2 Interface Design Concepts ........................................................................... 69

    5.6.3 Operation ..................................................................................................... 71

    5.6.3 Performances ............................................................................................... 72

    Conclusions .................................................................................................................. 73

    6.1 Critical reviews .................................................................................................... 74

    6.1.1 Achievements ............................................................................................... 74

    6.1.2 Limitations .................................................................................................... 75

    6.2 Future work ......................................................................................................... 76

    6.3 Application areas ................................................................................................ 76

    6.4 Project feedback ................................................................................................. 77

    References ................................................................................................................... 78

    References ................................................................................................................ 79

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 8 of 81-

    Revision History:

    Date Author(s) Comments

    09-04-2010 Jack Ng First draft version

    11-04-2010 Jack Ng First release

    16-01-2011 Jack Ng Changed the title;

    Modified the Abstract;

    Typos and Grammar errors correction

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 9 of 81-

    Chapter 1

    Introduction

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 10 of 81-

    Introduction

    Eyes are the most important sensory organ of human; more than half sensory

    impressions come from eyes. Moreover, gaze is a powerful cue for determining what

    might be interesting for the observer (Duchowski, 2003). Generally speaking, eye gaze

    is an indicator showing a person’s attention over time. Therefore, eyes have not only

    indispensable meaning for human communication, but also great potential to build

    human computer interaction in a more natural and direct mode (Jacob and Karn, 2003).

    Since gaze information has valuable and useful applications in human computer

    interaction and user intention detection, various gaze tracking algorithms have been

    proposed and some of them have been commercialized (Daunys et al, 2006).

    Gaze tracking which is originated from the research of eye movement (Jacob, 1995) is

    defined as a continuous process of measuring the "Point of Regard" (PoR) or the "Line

    of Sight" (LoS) of eye (ITU Gaze group, 2009). Eye tracking and gaze estimation are

    the two main procedures involved in tracking eye gaze. The process for detecting and

    tracking relevant features (e.g. pupil center) in the eye image is known as eye tracking.

    Gaze estimation is the mathematical procedure to translate image features into the

    gaze coordinates.

    With the advancements in computer vision technologies, recently, gaze tracking has

    already been considered as a solved problem. Corneal reflection based tracking

    method is commonly adopted by popular gaze tracking algorithms (Daunys and

    Ramanauskas, 2004; Goni, 2004; Li et al, 2005) and commercialized gaze tracking

    products (EyeTech Digital Systems, 2009; Lc technologies, 2009). Infrared (IR) lights

    used to actively illuminate the eye region to produce speak of light reflected by the

    cornea are known as corneal reflection or glint. The corneal reflection remains

    stationary during eye movement. Based on the eye images captured by camera, gaze

    can be estimated based on the relative position between pupil and the glints in the

    image. Gaze tracking using active IR method can be divided into two types: remote

    tracking and head mount system. Remote tracking system which widely appears in

    commercial tracking systems usually employs high-end camera for image capturing.

    High accuracy and few degrees of head movement can be achieved. Head mounted

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 11 of 81-

    system is placed in a helmet or special glasses together with IR lighting device and

    camera. The whole system follows the user’s head movement. Li et al (2005) shows

    that satisfactory tracking results can still be obtained even low resolution camera is

    used. Thus most low-cost gaze tracking solutions are based on the head mounted

    approach (San Agustin et al, 2009; Li et al, 2005). However, gaze tracking systems

    based on IR illumination have many limitations. First, most eye tracker using IR cannot

    properly operate when presenting another light source since this method greatly relied

    on IR illumination and thus quasi-stable lighting conditions is the minimal prerequisite

    (Villanueva et al, 2008). As a result, this approach is only suitable for indoor use, and

    not recommended for user wearing glasses. Second, the position of IR light source and

    the camera need to be carefully calibrated before tracking. It is not a favor for home use

    environment. IR lights are employed to produce corneal reflections as they are barely

    visible to human vision. It truly enhances user experiences towards gaze tracking, but,

    at the same time eyes’ protection mechanisms against bright light by the natural

    aversion cannot function. Issues regarding the long period of IR exposure time are

    raised (Mulvey et al, 2008). Guidelines regarding long period or close proximity

    exposure of IR have not been addressed in current infrared safety standards yet. Thus,

    potential eye hazard may exist. More recently, various eye gaze-tracking algorithms

    without IR lights have also been proposed (Villanueva et al, 2008). Kohlbecher et al

    (2008) proposed a gaze tracking algorithm based on the shape of iris through ellipse

    fitting to infer the eye gaze. Again, high end hardware components are required.

    As mentioned before, most gaze tracking systems are driven by specific design or high-

    end hardware and operating software, they are varied by different manufacturers (Bates

    et al, 2005). High costs of hardware and software have always been the barrier

    prohibiting the wide spread of gaze tracking technology. The marketing study

    conducted by Jordansen et al (2005) described that up to year 2005, an eye tracking

    system in Europe costs from EUR 4,100 to EUR 17,900, which is around HK 47, 200 to

    HK 207, 640. Furthermore, same study reported the majority targeted user groups of

    commercial gaze tracking products are those with disabilities, such as ALS or locked-in

    syndrome and research organizations. Widespread integration of eye tracking to

    consumer-grade human computer interfaces is rarely seen.

    This project focuses on promoting gaze tracking technology to consumer-grade human

    computer interfaces by reducing the price, emphasizing ease-of-use, increasing the

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 12 of 81-

    extendibility, and enhancing the flexibility and mobility. Instead of relying on active IR

    illumination and the corneal reflections, a robust facial feature based gaze tracking

    approach proposed by Chen et al (2008) is employed. 2D facial features are tracked

    and then used to estimate gaze. In contrast with Chen et al (2008), my proposed

    system only requires a single uncalibrated camera without hardware modifications (e.g.

    building IR LED grid) instead of stereo camera, thus off-the-shelf components can be

    used. This method can work properly without IR lights, so it makes the gaze tracking

    system work under both indoor and outdoor conditions. Since no active illumination is

    required, therefore wearing glasses is no longer a problem. Owing to low-cost and off-

    the-shelf hardware components are employed, the price will be reduced by hundred

    times if a webcam is used as it only costs about HK 100 to HK 500. Once the price

    drops to the mentioned range, gaze tracking interfaces will appear everywhere

    (Jordansen et al, 2005). Gaze tracking technology will revolutionize future development

    of human computer interaction methodology. The framework is packed into a

    programming library and available to public in an open-source package which makes

    those complicated implementations transparent. Developers can build applications

    concerning to gaze tracking interface only in few function calls.

    The report will begin by presenting the historical and theoretical background reviews on

    gaze tracking techniques. Detailed descriptions and justifications of the proposed

    method will be shown immediately after literature review section. Experiments were

    conducted with results to demonstrate the performances of our gaze tracking system. A

    case study on an application prototype built on top of our library is discussed in details.

    Finally, conclusions on achievements and limitations were given, and future works are

    suggested.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 13 of 81-

    Chapter 2

    Literature Review

    Gaze Tracking

    Eye Tracking Techniques

    Video-based gaze trackinghardware setting

    Potential hazards with IR

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 14 of 81-

    2.1 Gaze Tracking

    General speaking, eye tracking is the process for measuring the eye position and

    movement. This project is interested in gaze tracking rather than eye tracking, but rest

    of the report will go to review a range of techniques in eye tracking as well as face

    tracking, which is related to eye-gaze tracking. The term “gaze tracking” instead of

    “eye-tracking” will be used when refers to issues of measuring the eye-gaze direction or

    "Point of Regard". Knowledge in biological and psychological of the human vision

    system is essential to understand the process from gathering eye's positional

    information to eye-gaze information.

    2.1.1 Biological structure of human eye

    Figure 2.1: The Anatomy of the Eye

    (Quade, 2009)

    Eye is regarding as one of the most complex organs in the human body. Operations of

    the eye can be imagined as operating a camera. Light rays from an object enter the eye

    through a small hold called pupil, then passing through a focusing lens and finally be

    focused on the retina. Ciliary muscles are responsible for changing the thickness of the

    lens (i.e. focal length is adjusted) in order to focus objects from various distances on the

    retina. Iris which gives the colored ring outlook of the eye is used to controls the amount

    of light entering the eye. Retina is a membrane containing numerous photoreceptors

    (rods and cons) which lying on the near surface of the eyeball similar to film in a

    camera. The photoreceptor transforms the light energy to electrical impulses or neural

    signals, and these signals are then transmitting to visual processing part of the brain

    through the optic nerve (Hyrskykari, 2006).

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 15 of 81-

    Figure 2.2: Cross section of a human eye

    (Hyrskykari, 2006)

    2.1.1.1 Visual angle

    Visual angle (angular size) is the angle of a viewed object subtend to the retina. Given

    the object’s height s, d the distance from the lens to the object, the visual angle α can

    be calculated using the formula: (Hyrskykari, 2006).

    Figure 2.3: The visual angle

    (Hyrskykari, 2006)

    2.1.1.2 Field of view

    Field of view (or Field of vision) is the defined as the horizontal angular (linear or areal)

    extent of a given scene that is seen by eyes and determined by the placement of eyes.

    Fields of view can be classified into two types: field of view of an individual eye and field

    of view of an overlapped portion of eyes (binocular field). Field of view of human is in-

    between 160 to 208 degrees and 120 to 180 degrees for individual eye (Savas, 2005).

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 16 of 81-

    2.1.1.3 Visual acuity

    Visual acuity refers to the ability of a person to perceive spatial detail. The further the

    distance apart from the fovea, the lower is the visual acuity. A normal young person has

    visual acuity measuring in the order of minutes or sometimes seconds of visual angle,

    but visual acuity will decrease as the age increase. Whereas visual acuity is measuring

    in minutes, such accurate gaze estimation cannot be obtained because of gaze cannot

    be considered to be only a sharp point on a scene. When a point on a scene falls onto

    center of the fovea, not only that point can be perceived sharply, but also some other

    surrounding areas that fall onto the rest of the fovea. In addition, a person can shift his

    own visual attention without eye movement (Hyrskykari, 2006). These are the reasons

    of some potential error is still appeared even an exact point of a scene fall onto the

    fovea is tracked. This potential error is reported by Jacob and Karn (2003) is

    approximate one degree and Duchowski (2003) two degrees.

    Figure 2.4: The visual acuity of the eye (Hyrskykari, 2006)

    2.1.1.4 Movement of eye

    Eye movement can be divided into three types, namely saccadic movements, smooth

    pursuit movements and convergent movements. Smooth pursuit eye movements occur

    when eyes tracing a moving object in field of view. Convergent eye movements are

    response to keep both eyes focus on the object. Since eyes normally do not only trace

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 17 of 81-

    the object smoothly, but also perform sudden jumps from one point to another, this is

    called saccadic eye movements. Saccadic eye movements are one of the fastest

    movements our body can make. Eyes can perform rotation in an amazing speed of

    about 500 degree per second and repeat this saccadic action over hundred thousand a

    day (Savas, 2005). Saccadic eye movements are done by three pairs of muscles

    attached outside on the eyeball. They are arranged in response three rotation actions:

    horizontal (left - right), vertical (up - down) and about the line of sight. The saccadic

    latencies are called fixations when the perception of visual objects occurs. Fixation time

    is various for different tasks but typically average around 250 ms. Eyes are not entirely

    steady throughout whole fixation period, they are performing some movements in a

    smaller scale. These small movements caused recognition of fixation become more

    complicated (Hyrskykari, 2006).

    Figure 2.5:Eye’s directions of saccadic movements (Oculomotor Research Group, 2006)

    2.1.2 Mathematical eye model

    Formulate the eye as a mathematical model is a must in order to enable performing

    precise description and calculation of gaze tracking. Optical axis is defined as an

    imaginary line passing through the eye ball center and the pupil center. The visual axis

    is defined as the line joining the center of the fovea and the lens which make an angle

    to the optical axis. Daunys et al(2005) reported that in a typical adult human eye, the

    fovea falls about 4-5 degree temporally and about 1.5 degree below the point of

    intersection of the optic axis and the retina.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 18 of 81-

    Figure 2.5:Mathematical model framework of eye (Daunys et al, 2006)

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 19 of 81-

    2.2 Eye Tracking Techniques

    Although human computer interfaces involving user's eye-gaze control mode is a new

    branch of eye tracking research. However, eye tracking itself has its long history being

    used for medical or psychological research for more than a half century, but not for

    everyday computer interfaces (Duchowski, 2003). Tracking pupil or iris center and

    determine the degree of eye movement in the face image is the first step of designing a

    gaze tracking method. Eye detection and eye tracking techniques have been developed

    for more than a half century which is employed in medical and psychological research

    (Duchowski, 2003).Recently, eye tracking approaches can be classified into at least

    four categories. Detail explanations are as follows.

    2.2.1 Electro-Oculography (EOG)

    Electro-Oculography (EOG) technique was been widely used in eye movement tracking

    over the pass forty-year and still being frequently used in the clinical environment today.

    There are approximate prominently 1mV potential differences between Cornea and

    Fundus. EOG evaluates eye movement by measuring the electric potential differences

    of skin around the eyes. This technique is measuring eye movement relative to user’s

    head position. Therefore, it is not quite suitable for measuring point of regard. Although

    EOG is cheap and non-invasive, however it is not a reliable method for quantitative

    measurement due to electrical signal might be changed even there is no eye movement

    as well as affected by metabolic changes in the eye (Savas, 2005).

    Figure 2.6: An EOG implemented eye tracker

    (EagleEyes Project, 2009)

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 20 of 81-

    2.2.2 Scleral Contact lens/ Search Coils

    This method employs contact lens mounted with wired coil attaching on the eye directly.

    Electrical potential difference is induced when the wired coil moving in the magnetic

    field, the eye’s movement was calculated through measuring the induced electric

    potential differences of the wired coil. EOG gives a very high temporal and spatial

    resolution result, which allowed small eye movement measurement. Although this is the

    most precise method for performing eye tracking, but it is invasive. Therefore, this

    method is rarely used in the clinical environment but usually implemented in research

    environments (Duchowski, 2003).

    Search Coil

    Turntable with Primelec Coil System, Neurology

    Dept. University Hospital Zurich

    Coil Frame (350 mm)

    Figure 2.7: Eye Tracking System CS681 (Primelec, D. Florin., 2009)

    2.2.3 Video-Oculography with Corneal Reflection

    When a fixed light source is actively illuminating the eye region, light reflections will be

    formed on the cornea, known as “Purkinje images”. Infrared (IR) light is usually being

    used as that light source since IR is barely visible to human eye hence it does not serve

    as a distraction. The first Purkinje image (called “glint”) is captured by eye tracker using

    calibrated infrared sensitive camera. The position of glint is remained constant with

    minor head movement, thus eye rotation is truly reflected by the position between the

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 21 of 81-

    pupil centre and the glint. Therefore, viewer’s Point of Regard (POR) can be calculated

    using this prosperities. There are two general types of eye tracking techniques related

    to active eye illumination method: “Bright Pupil Tracking” and “Dark Pupil Tracking”.

    The different between these two techniques is based on the location of the light source

    (Daunys, 2006; Duchowski, 2003; Glenstrup and Engell-Nielsen, 1995). Both

    techniques will produce large iris pupil contrast in the captured image which allows

    robust eye tracking, but, there are two related problems. First, the contract between

    pupil and the rest of eye area in the image becomes not clear if other external light

    sources are present at the same time such as a outdoor condition which make the

    tracking algorithm hard to determine the boundaries of eye features. Second, when the

    user wearing glasses or contact lenses, multiple glints will appears which is hard for the

    algorithm to find the true corneal reflection (Daunys, 2006).

    Figure 2.8: The four Purkinje images are form when lights directed to the eye

    (Glenstrup and Engell-Nielsen, 1995)

    2.2.3.1 Bright Pupil Tracking

    If the illumination is coaxial with the optical path, eye will act as a retroreflector.

    Therefore, the lights reflecting off the retina will be the same direction as incoming light

    which similar to red eye. This phenomenon is known as bright pupil effect which make

    pupil appears as a very bright spot and iris as a dark disc in the captured image. This

    approach is work better for people with blue iris color (Tobii Technology AB, 2009).

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 22 of 81-

    Figure 2.9: A gold corner cube

    retroreflector (Retroreflector, 2009) Figure 2.10: Bright pupil formed on the

    captured image(Daunys, 2006)

    2.2.3.2 Dark Pupil Tracking

    If the illumination source is offset from the optical path, the reflecting light form retina

    will not same as the incoming light direction. Therefore the pupil appears dark in the

    captured image. This approach is work better for people with dark eyes (Tobii

    Technology AB, 2009).

    Figure 2.11: Working principle of a corner reflector (EyeTracking, 2009)

    Figure 2.12: Eye region image with corneal reflex (Daunys, 2006)

    2.2.3.3 Problems

    Large iris pupil contrast allows robust eye tracking with all iris color, but, there are two

    problems with this technique. First, the contract between pupil and the rest of eye area

    in the image becomes not clear if other external light source is present at the same time

    such as outdoor condition which made the tracking algorithm hard to determine the

    boundaries of eye features. Second, when the user wearing glasses or contact lenses,

    multiple glints will appears which is hard for the algorithm to find the true corneal

    reflection (Daunys, 2006).

    http://upload.wikimedia.org/wikipedia/commons/0/03/Corner-Cube.jpg

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 23 of 81-

    2.2.4 Video-Oculography with visual light

    This approach is only relying on the image analysis algorithms alone rather than active

    illumination. Images captured by calibrated camera under normal lighting are direct

    inputted to the algorithm for performing gaze estimation. There are various algorithms

    proposed under in this category, which can be mainly classified into three types:

    deformable templates based, appearance based, and feature based methods.

    Deformable template-based and appearance based methods are attempted to fit the

    predefined model to the image while feature based are attempted to fit the image

    features to the fixed model (Daunys, 2006).

    2.2.4.1 Deformable templates based

    Deformable template tracking method is based on a manually predefined generic

    template which is matched to the image. The correlation value computed for a

    candidate image with the predefined template which is used to determine existence of

    eye. This approach is accurate and easy to implement, but it cannot deal with variation

    in scale, pose and shape effectively. Moreover, matching a template is computationally

    demanding and high contrast image is required (Savas, 2005).

    2.2.4.2 Appearance based

    Appearance based tracking method is based on statistical analysis and machine

    learning to find the relevant characteristics of eye and non-eye images. The learned

    characteristics are appearing in the form of distribution models or discriminant

    functions, which are used for eye detection. Distribution model is a probabilistic

    framework. Bayesian classification or maximum likelihood is used to classify whether a

    candidate image as eye or non-eye. However, high dimensional image makes the

    implementation of Bayesian classification infeasible. Discriminant function is derived by

    projecting the high dimensional image to a lower dimensional space which used to

    image classification. PCA and Hidden Markov Model are the most commonly used

    appearance based technique (Savas, 2005).

    2.2.4.3 Feature based

    Feature based tracking methods is based on extracting particular features such as color

    distribution of the eye region or feature points of the eye in the image to perform

    identification. Feature based tracking consists of feature extraction and feature

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 24 of 81-

    mapping. A typical feature based tracking algorithms are particle filter, Gabor filters,

    Kalman filtering and mean shift (Zhou et al, 2008).

    2.3 Video-based gaze trackinghardware settings

    Video-based gaze tracking approach is the main concern of this project, therefore, only

    hardware setting adopted by video based gaze tracker will be reviewed. Generally

    speaking, video based gaze tracker can be classified into two types: head-mount

    tracker and remote-tracker (or table-mount) based on whether the cameras are

    attached to the subject’s head or positioned remotely.

    2.3.1 Head-mount

    Head-mounted gaze tracker estimate gaze direction relative to the user’s head position.

    Applications that require fast head movements and low cost gaze tracking solutions

    (Winfield, 2005; The system I4Contro, 2009) are preferred to employ head-mount

    approach. Since the camera is placed in a close range to user’s eye. On the other

    hand, higher intrusion level makes this type of trackers unsuitable for computer control

    (Daunys et al, 2006).

    Figure 2.13: openEyes system using head-mounted device (Winfield, 2005)

    2.3.2 Table-mount

    Table-mount gaze trackers track the head position and orientation in 2D or 3D space.

    This type of system does not require any attachment to the user and allow free head

    movement within certain limits, thus these are more adequate for computer control.

    However, accuracy of gaze estimation is lower compare with head-mount system. High-

    resolution camera is usually preferred for remote tracking (Daunys et al, 2006).

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 25 of 81-

    2.4 Potential hazards with IR

    The spectral emission of infrared LEDs employed in most IR based eye tracker is

    usually limited to a near infrared band (IR-A, 780-1400 nm). Notwithstanding IR-A band

    LEDs have been tested and result shows that clearly no hazard to the eye for viewing in

    some short period of time (few hours) based on current national and international ocular

    exposure limits for infrared optical radiation. However, explicit guidelines regarding long

    period or close proximity exposure of the eye to IR have not been addressed in any

    current infrared safety standards yet. Potential hazards with are still remaining an open

    question (Mulvey et al, 2008). Moreover, Mulvey et al (2008) reported emissions are

    possible outside IR-A range if a conventional incandescent lamp or discharge lamp that

    has been filtered to block most the visible light and transmit IR-A is employed.

    Figure 2.15: The different photo-biological effects of optical radiation (Mulveyet et al, 2008)

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 26 of 81-

    Chapter 3

    Methodology

    Introduction

    Active shape model

    Active appearance model

    POSIT

    RANSAC

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 27 of 81-

    3.1 Introduction

    Instead of relying on active IR illumination and the corneal reflections, a robust facial

    features based gaze estimation approach is proposed based on the ASM face tracking

    algorithm and proposed eye features extraction algorithm. To achieve our aims, the

    proposed approach involves different algorithms and solutions in computer vision. In

    this chapter, briefly reviews are given on some of the techniques and concepts, which

    were adopted in our system.

    3.2 Active Shape model (ASM)

    Automatic and accurate location of facial features is a difficult problem in computer

    vision. The variety of human faces, expressions, facial hair, glasses, poses, and lighting

    contributes to the complexity of the problem. Active Shape Model (ASM) is a solution to

    this problem. Active Shape Model is a kind of shape statistical models, which iteratively

    deform to fit to the object in a new image. The shape is constrained by a Statistical

    Shape Model which only can be deformed in ways seen in a training set of annotated

    examples. ASM is needed to be trained on a set of manually landmarked images first.

    After training, the statistical shape model can then be used to extracting feature points

    on a face. The searching steps involve:

    1. Locating each landmark independently.

    2. Correcting the locations of each landmark if necessary by looking at how the

    landmarks are located with respect to each other.

    Figure 3.2: ASM template fitting process

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 28 of 81-

    (Cootes, 2009)

    3.3 Active appearance model (AAM)

    The Active Appearance Model (AAM) merges the shape and texture model into a single

    model of appearance. AAM itself contains a statistical model of the shape and grey-

    level appearance of the object of interest. Template matching of the model to the image

    involves finding model parameters, which minimize the difference between the image

    and a synthesized model (Cootes, 2009).

    Figure 3.2: AAM template matching process (Cootes, 2009)

    3.4 POSIT

    Pose from Orthography and Scaling with Iteration” also known as POSIT is a useful

    algorithm used to estimate the positions of known objects in three dimensions. It is

    originally proposed by DeMenthon, D. 1993. To compute an object’s pose, at least four

    non-coplanar points, their corresponding 2D projections on the image must be found.

    The perspective scaling of known objects can be found and thus compute its

    approximate pose through the first part of the algorithm – pose from orthography and

    scaling (POS). However, the approximations from POS will not be very accurate.

    Therefore, the four observed points are then projected at the pose calculated through

    POS and start POS algorithm again with these new point positions. Typically, the true

    object pose can be recovered within four or five iterations. (Bradski and Kaehler, 2008)

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 29 of 81-

    3.5 RANSAC

    RANSAC is an abbreviation for "Random Sample Consensus" was first published by

    Fischler and Bolles in 1981. POSIT is an algorithm for robust fitting of the model to data

    in the presence of many data outliers. The inputs to the RANSAC algorithm are a set of

    obtained data values, a parameterized model which can be fitted to the data, and some

    threshold parameters.

    1. Selecting a random subset S of the original data as hypothetical inliers

    2. Fitting the model to the hypothetical inliers

    3. Testing all other data against the fitted model to see whether they fit well to the

    estimated model, if yes, also considered as a hypothetical inliers

    4. Then re-estimating the model to new hypothetical inliers if there contain sufficient

    points

    5. Evaluating the model by estimating the error of the inliers relative to the model

    Repeat step 1-5 N times, each time producing either a rejected model or a refined

    mode, the refined model is kept if its error is lower than the last saved model. After N

    iterations, we can get the fitted model which is best fitted (Fisher, 2009).

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 30 of 81-

    Chapter 4

    Design & Implementation

    Introduction

    Development environment

    System design

    System implementation

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 31 of 81-

    4.1 Introduction

    The designs and implementations of the gaze-tracking framework library named

    “GazeLib” will be presented in details. The main capabilities of the library are:

    1. Face detection

    2. Annotated facial feature points tracking

    3. Head pose estimation

    4. Features extraction of both eyes individually ( pupil or iris center and radius)

    5. 2D trajectory extraction of both eyes individually ( measuring the pupil center )

    6. Gaze tracking of both eyes individually

    7. Blink detection of both eyes individually

    In additions, tracking results in video sequence (e.g. facial model fitting result, eye

    trajectory extraction, etc.) is also implemented, which may provide useful information for

    further analysis or studies.

    4.2 Development Environment

    The system is developed using a notebook with the following configurations:

    Hardware configurations:

    1. CPU: Intel Core2Due (P7500) 1.66 GHz, FSB 800 MHz

    2. RAM: 2GB DDR2-667

    3. MON: 13.3-inch LCD widescreen display, 1280 x 800 pixel resolution

    4. CAM:

    a. Built-in iSight, 640 x 480 pixel resolution

    b. E-tiger, 640 x 480 pixel resolution

    Software configurations:

    1. Microsoft Visual C++ 2008

    2. OpenCV 2.0

    3. ASMLibrary 4.09

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 32 of 81-

    4.3 System design

    4.3.1 Architecture

    The system architecture is illustrated in the figure. The system is based on OpenCV

    library and ASM Library. OpenCV (Open Source Computer Vision) programming library

    originates from Intel, providing a lot of algorithms for computer vision processing. ASM

    Library (Active Shape Model Library SDK) is a C++ implementation of the Active Shape

    Model framework developed by YAO Wei. This great library contains algorithms for

    training and building the statistical model together with the fitting algorithms.

    “GazeLib” is packed and released as an open source programming library framework.

    Developers can build their own applications making use of gaze tracking technology

    based on “Gazelib” without any prior knowledge.“GazeLib” not only aims to achieve

    high reusability, but make it possible to create revolutionary applications that will set the

    bar for the next generation gaze tracking application.

    Figure 4.1: System architecture

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 33 of 81-

    The system is divided into three components:

    1. Face detection and tracking component

    2. Eye tracking component

    3. Gaze to screen coordinates mapping component

    All components need to be work collectively and correctly in order to have a working

    gaze tracking system.

    4.3.2 Conceptual Class diagram

    Figure 4.2: System class diagram

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 34 of 81-

    4.4 System implementation

    4.4.1 Overall system flow

    Figure 4.3: System flowchart

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 35 of 81-

    The flowchart summarized the main system flow as a whole, face detection, facial

    feature tracking, head pose estimation, eye feature extraction and gaze mapping

    process. Every algorithm in details will be discussed through later sections.

    4.4.2 Active Shape Model building

    At the very beginning, a onetime Active Shape Model (ASM) training process needs to

    be done before everything can work correctly.

    Number of landmarks

    Figure 4.4:Mean error versus number of landmarks

    (Milborrow and Nicolls 2008)

    The landmark point number of the model directly affects the fitting result. Milborrow and

    Nicolls (2008) conducted experiments on point-to-point error against the number of

    landmarks, the result in the study shows that in order to improve the mean t is to

    increase the number of landmarks in the model. Because of fitting a landmark tends to

    help fitting other landmarks. Therefore fitting results are improved by increasing the

    landmarks number. In a meanwhile, the searching time increases roughly linearly with

    more landmarks.

    As a result, different head pose images with manually annotated 68 facial landmark

    points of various individuals were used for active shape model training algorithm to

    build the 2D statistical model. The face image used in ASM training is extracted from

    the G-NET AGING DATABASE and DTU IMM face database.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 36 of 81-

    Figure 4.5: Manually annotated 68 landmarks face model

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 37 of 81-

    4.4.3 Face detection and tracking

    The face detection algorithm is based on Viola-Jones Classifier implemented in the

    OpenCV library. The facial feature points tracking algorithm is a model-based face

    tracking method.

    Low cost tracking equipments such as a webcam can only offer low resolution capture,

    which is an inborn limitation to the tracking system. Fortunately, the eye tracker does

    not require very accurate feature points fitting result. A satisfactory result is enough to

    define the searching windows for eyes’ features extraction. In addition, the whole

    system operates in real-time, so the tracking speed is directly affecting the performance

    of the tracker. Thus, high stability and efficiency with acceptable accuracy are the main

    concerns in designing the face tracking algorithm.

    Active appearance model (AAM) is originally designed to use as facial feature

    extraction. Compare with active shape model (ASM), AAM is more stable and accurate.

    AAM face tracking is stable provide that nearly frontal face. It does not work quite well

    for tracking face at an angle. It is found that AAM cannot deform to right shape after

    head rotation. ASM tracking is not as stable as AAM, but a face rotated at some degree

    it is still able to be tacked but not rotation in a great angle.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 38 of 81-

    Figure 4.6: Fitting results comparison between AAM and ASM

    In addition, the computational cost for performing AAM is much higher than ASM. The

    result in the experiments shows that cost for performing AAM only can achieve average

    5 fps while ASM can achieve average 9.5 fps. The results indicated that AAM is not a

    good choice for implementing a real-time gaze tracking system.

    Figure 4.7: Performances comparison between AAM and ASM

    By experiments and observations, the model fitting stability and result can be improved

    by reducing noises and details on the inputted image. The following figure showing the

    fitting results of performed ASM fitting directly, use Gaussian filter or Median filter

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 39 of 81-

    before performing ASM fitting. The result shows that the fitting performance of the

    Median filter is the best among three. Therefore, the inputted image is passed to the

    median filter to perform noise reduction first. Median filter can reduce the noise in an

    image while preserving the edges at the same time. This is the reason to choose

    Median filter rather than other filters. In other to further increase the speed, the image

    will be scaled by half before doing active shape model fitting.

    Figure 4.8: Face fitting result with different filters

    Before doing active shape model fitting, the face is first detected by using Viola-Jones

    Classifier. If the face is presence, the mean model shape will be initiated by using the

    VJ classifier detection result. Finally, the active model will be fitted iteratively to the

    image until converge or maximum allowed number iteration reach.

    Locating facial landmarks is equivalent to locating facial features since landmarks lay

    out facial features, which are the main advantage of adopting this approach. As there

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 40 of 81-

    are 68 landmarks distributed on the whole face, not only the feature points around eyes

    will be tracked. Therefore, this tracking method is not limited to eyes’ region extraction,

    but also can be extended to other facial feature extraction and tracking such as mouth.

    Figure 4.9: head and detection tracking algorithm flowchart

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 41 of 81-

    4.4.4 Head poses estimation

    There are two approaches to implement head poses estimation algorithm. The first

    approach is using POSIT algorithm provided in OpenCV library. The second approach

    is using LK Optical, RANSAC algorithm together with POSIT to perform head poses

    estimation. The details of two approaches will be discussed in the following section.

    4.4.4.1 Approach 1: Using POSIT only

    This approach is relative simple compare to the second approach. The head tracker is

    keep fitting the facial landmarks to the inputted image, once the user initiated the 3D

    head pose estimation algorithm, the lankmarks fitting result from head tracking will be

    used to build a 3D head model and the POSIT object. In the next round execution, the

    POSIT object together with the face fitting result will be inputted to the POSIT algorithm

    to perform the 3D head poses estimation, a 3D rotation matrix and a translation vector

    were obtained.

    Figure 4.10: POSIT head pose estimation algorithm flowshart

    4.4.4.2 Method 2:Using LK Optical flow + RANSAC + POSIT

    The second approach is more complicated than the first approach. It is trigger to initiate

    by the user, or otherwise the system just keeps doing face tracking and eye features

    extraction. Numerous feature points will be marked on the inputted face image when

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 42 of 81-

    the algorithm is first time triggered. The marked feature points will be considered as

    model points set which are used to initiate POSIT object later. The feature points

    marked will be tracked using LK Optical flow algorithm. RANSAC algorithm is applied to

    the successful tracked feature points. The following steps were performed:

    1. A subset from the successful tracked points will be randomly selected to perform

    3D pose estimation.

    2. All other points in the successful tracked points will be tested against the fitted

    model to see whether they fit well to the estimated model.

    3. Points within a certain distance threshold are considered as inlier to the model.

    These steps were repeated several times, the set of with the largest inlier number is

    considered as the best tracked points set.

    Finally, POSIT object is built on the fly using the model points and the best tracked

    points. POSIT algorithm is used to estimate the 3D head pose. After the estimation

    process, the outliners in model points set will be deleted.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 43 of 81-

    Figure 4.11: Using LK Optical flow, RANSAC with POSIT to estimate head pose

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 44 of 81-

    4.4.5 Eye feature extraction

    The pupil region can be extracted by using the edge detection and ellipse fitting

    algorithm. In order to obtain a good fitting result, some preprocessing steps need to be

    done before performing edge detection.

    The first step is to convert the color image to grey scale color space in order to facilitate

    edge detection. By experiment, more strengthened edge image can be obtained if using

    different components of a color spaces, for example, using B component of RGB is

    better than converting whole RGB to gray-scale. After converted to grey scale, median

    filter is applied to reduce noises while preserving the sharp edges. Histogram

    equalization is performed spreading out the brightness values of the image, thus image

    contrast is increased.

    Colors of human iris are ranging from brown to green, blue and dark brown, etc. The

    color of the pupil region is always black despite a wide range of iris colors. There are

    strong contrasts between dark iris/pupil with their outside area. Therefore binary

    threshold can be applied to the grey scale eye image to eliminate edges formed by

    lower intensity areas.

    Figure 4.12: Image threshold results of different color iris

    Edge detection is performed on the threshold image, since more than one edges curve

    maybe existed. Therefore a knowledge based method is implemented in order to

    extract the best fit iris/pupil ellipse. The best fit ellipse is selected using following steps:

    1. Select an edge from the edges detected.

    2. Perform ellipse fitting on the selected edge curve.

    3. Mark the fitting as valid if the ratio of the major radius and the minor radius is

    less than three times, otherwise ignore the fitting result and continue to the next

    iteration.

    4. Repeat the above steps until all edges were examined

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 45 of 81-

    5. Select the ellipse with the largest area

    After the best fitted ellipse is extracted, the rest steps are going to find whether the

    ellipse is a valid iris. The testing criteria are as follows:

    1. Result ellipse width > inputted eye region image width / 2

    2. Result ellipse width inputted eye region image height

    4. Result ellipse height

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 46 of 81-

    4.4.6 Gaze Estimation

    The screen is treated as a rectangle with n * m pixel. Assuming eye movement range

    corresponding to the corner points of the screen are formed a prefect rectangle with i * j

    pixel. A simple gaze estimation method using ratio mapping is adapted. The details will

    be discussed in the following sections.

    4.4.6.1 System calibration

    Since a ratio based screen coordinate mapping is used to estimate the eye-gaze,

    therefore, a calibration procedure is performed in order to obtain different pupil center

    positions in pixel (reference to the capturing screen) corresponding to the edge of the

    screen to calculate the eye movement rectangle. User is required to focus on different

    calibration points printed on the screen during the system calibration process.

    Figure 4.14: Nine point calibration procedure (red dot showing the calibration points, the blue color number is the calibration

    sequence and brown characters inside () defining optimal criteria to that particular calibration point)

    The steps are as follows:

    1. A particular calibration point is printed on screen at each time

    2. The pupil positions corresponding to that particular point in certain time period

    are recorded while user focusing on that point.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 47 of 81-

    3. An optimal point is selected from the recorded points set in order to represent

    the pupil position corresponding to the particular screen calibration point.

    p.s. The optimal point means the point which closest to the limit, different

    calibration point have its own optimal definition. The detail is shown in the

    following figure. By experiment, the following optimal definition is more accurate

    than calculating the average among the recorded set of points.

    4. The above steps are repeated until pupil centers’ position of all calibration point

    is obtained.

    Figure 4.15: system calibration flowchart

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 48 of 81-

    4.4.6.2Gaze estimation

    For gaze estimation, a very simple mapping is performed. Firstly, the screen is treated

    as a square with n * m pixel and the eye movement range corresponding to the corner

    points of the screen are formed a prefect rectangle (gaze rectangle) with i * j pixel. The

    gaze rectangle is calculated by fitting a rectangle to the pupil centers obtained in the

    system calibration procedure.

    Pupil center and gaze rectangle coordinates are measured reference to the capturing

    screen coordinate. By solving the above two equations, the gaze screen coordinates

    mapping can be obtained.

    Figure 4.16: The pupil center to screen coordinate mapping

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 49 of 81-

    Chapter 5

    Results & Comments

    Introduction

    Testing environment

    Performance evaluations

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 50 of 81-

    5.1 Introduction

    In this chapter, evaluation on performance of the whole system and other subsystem

    will be presented. At the beginning, prosperities of the testing environment will be

    defined. Then, performance of each component is explained. Since the result obtained

    from pervious step is the input of the next step, thus the performance of the whole

    system is highly dependent on the performance of each component. For an example, in

    order to extract the pupil center of each eye, the system must first localize the facial

    feature points correctly; then the eye searching region will be calculated directly from

    the face tracking result. Finally, the eye features can be determined. At the end of this

    chapter, performance of an application called “GazePad:” which implemented based on

    our gaze tracking library will be studied and discussed.

    5.2 Testing environment

    The system is tested with a notebook and a PC workstation with same software

    configurations but in different hardware environments:

    Notebook configurations:

    1. CPU: Intel Core2Due (P7500) 1.66 GHz, FSB 800 MHz

    2. RAM: 2GB DDR2-667

    3. MON: 13.3-inch LCD widescreen display, 1280 x 800 pixel resolution

    4. OS : Window 7 Professional 32bit

    PC workstation configurations:

    1. CPU: Intel Core2Due (E8500) 3.17 GHz, FSB1333 MHz

    2. RAM: 4GB DDR2-667

    3. MON: 21-inch LCD widescreen display, 1280 x 800 pixel resolution

    4. OS : Window Server 2008 Enterprise 32bit

    Camera used:

    1. Macbook built-in iSight, 640 x 480 pixel resolution

    2. E-tiger, 640 x 480 pixel resolution

    3. Polar Net-Cam, 640 x 480 pixel resolution

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 51 of 81-

    Software configurations:

    1. Microsoft Visual C++ 2008

    2. OpenCV 2.0

    3. ASMLibrary 4.09

    Figure 5.1: showing the testing environment hardware settings

    5.3 Performance of face tracking

    5.3.1 Presence of distractions

    ASM face fitting can handle partly occluded face, because of other points at un-

    occluded part are helping to fit the points at occluded part. It is also indicated that the

    number of point is directly affecting the fitting performance in terms of accuracy. The

    more the points in training the ASM, the more stable and accurate fitting result can be

    achieved. However, increasing the number points also affects the fitting speed. Since

    the face tracker is Implemented using the active shape model, therefore the face model

    can be fit on the image as long as the most part of the face is presented, which allowing

    the tracker to be robust against distractions. As a result, presence of other faces, hand

    movements across the face or wearing glasses will not cause the algorithm to lose

    tracking of the face. Experiences against different distractions were conducted to show

    the performance of the face tracking algorithm.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 52 of 81-

    5.3.1.1 Glasses

    Experiments are conducted in regarding ASM face fitting when the user wearing

    glasses. Two types of glasses were tested: glasses with non-black color glasses frame

    and glasses with black color glasses frame. The result shows that the head tracker is

    performed well when the user wearing glasses regardless what type it is.

    Figure 5.2:Tracking with a pair of non-black color frame glasses

    Figure 5.3: Tracking with a pair of black color frame glasses

    5.3.1.2 Passing hand

    Unlike other tracking techniques, for example, Camshift or LK Optical flow, ASM is not

    easily distracted by a moving similar color object across the tracking object. ASM can

    be used to tack face with large movement as well while LK Optical flow not able to do

    this.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 53 of 81-

    Figure 5.4: Face tracking with hand occludes part the face

    5.3.1.3 Multiple faces

    Our face tracking algorithm is able to handle present with other faces. Only the face

    that is the closest to the central axis of the capturing screen will be considered while all

    other faces will be ignored. It is also indicated that the face tracking algorithm can be

    applied to different people.

    Figure 5.5: Face tracking with multiple faces

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 54 of 81-

    5.3.1.4 Face-like structures

    The face tracking result is not distracted by face-like structures since the ASM only

    fitted to the face closed to the training set.

    Figure 5.6: Face tracking with face-like structure present

    5.3.2 Various ambient lighting conditions

    5.3.2.1 Indoor

    Different indoor ambient lighting conditions are stimulated by adjusting a different level

    of brightness against the normal lighting level. It is shown that the face tracking

    algorithm can be applied to different lighting conditions means our face tracking

    algorithm works well in natural lighting frustrations. However, the tracking result is not

    quite stable and accurate in too light or too dark condition.

    Figure 5.7: Face tracking in various lighting simulation

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 55 of 81-

    5.3.2.2 Outdoor with complex background

    The face tracker is tested in an outdoor environment with complex background and

    moving object present. It works well in outdoor.

    Figure 5.8: Face tracking at outdoor environment

    5.3.3 Different facial expressions

    The trained active shape model can be deformed to fit face expressions as well. Some

    of the tracking results are shown in the following figure. The ASM fitting result is good

    as long as large ranges in different facial expressions in training samples are provided.

    Figure 5.9: ASM model fitting with difference facial expressions

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 56 of 81-

    5.3.4 Blurred inputted image

    A bit blurring of the inputted image is not affecting the fitting result provided that the

    edge of the face can be recognized. However, sometimes the face shape model maybe

    converged to wrong shape.

    Figure 5.10: ASM and iris fitting on blurred image

    5.4 Performance of eye features extraction

    5.4.1 Presence of distractions

    The result of eye features extraction is highly depending on the face fitting result. The

    eye tracker works well as long as the face fitting result is good, since the active shape

    model defines the eye tracker searching region. If the deformable model is converged

    to wrong shape, the eye features extraction result will be wrong.

    Figure 5.11: Searching regions of two eyes

    In some cases, the eye tracker is not affected even the active shape model is

    converged to wrong shape if the upper part of the shape model is almost in a right

    position.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 57 of 81-

    Figure 5.12: Eye features extraction in wrong converged shape model

    5.4.1.1 Glasses

    The face tracking result does not much affect by different type of glasses. However, eye

    features extraction can be greatly affected by different glasses frame color of the

    glasses. For example, user wearing a pair of glasses with black color glasses frame,

    since the iris fitting algorithm is based on the selecting the fitted ellipse with the largest

    area. Therefore, sometimes the features extracted maybe wrong.

    Figure 5.13: Features extraction result affected by black color glasses frame

    In most of the times, the features extraction is right. The following experiments show

    the extraction results.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 58 of 81-

    Figure 5.14: Eye features extraction with a pair of non-black color frame glasses

    Figure 5.15: Eye features extraction with a pair of black color frame glasses

    5.4.1.2 Passing hand

    As discussed in the early parts, eye features extraction is highly depending on the face

    tracking result. Since, the face tracker can handle a similar color object passing through

    the user’s face, therefore the eye features can be extracted correctly by the eye tracker.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 59 of 81-

    Figure 5.16: Face tracking with hand occludes part the face

    5.4.1.3 Face-like structures and multiple faces

    Since the searching region inputted to the eye tracker is defined by the fitted face

    shape model. Thus, present of multiple faces or face-like structure does not cause any

    effects on eye features extraction.

    Figure 5.17: Eye features extraction with face-like structure present

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 60 of 81-

    5.4.2 Various ambient lighting conditions

    Lighting is an important factor which affecting the extraction result the most. Extreme

    High or low brightness will cause errors in the color thresholding process, thus the

    extraction quality and performance will be degraded. Our eye tracker can perform well

    in the following conditions.

    5.4.2.1 Indoor

    Different indoor ambient lighting conditions are stimulated by adjusting a different level

    of brightness against the normal lighting level same as in face tracking lighting

    simulation. The results showing the eye feature extraction does not work quite well in

    the too dark environment. Features can still be extracted, but the stability and accuracy

    are degraded.

    Figure 5.18: Eye features extraction in various lighting simulation

    The tracking result in the dark environment is not performed well because of the color

    contrast different between the iris region and other regions on eye areno longer sharp.

    As a result, bad threshold image of the eye region is produced, which causing the iris

    fitting algorithm not work well.

    5.4.2.2 Outdoor with complex background

    IR based gaze tracker is not performed well in outdoor condition since IR is easily

    affected by present of other light sources. Our tracker is based on ambient color, which

    can work well in outdoor environment.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 61 of 81-

    Figure 5.19: Eye features extraction at outdoor environment

    5.4.3 Various iris colors

    Our eye features extraction algorithm is originally designed for dark iris color eye, but in

    reality, human iris colors are in a wide range. Our eye features extraction algorithm can

    be applied to different colored iris eye without code modification. The only thing

    changed is that the algorithm is extracting the pupil contour rather than extracting the

    iris region. The algorithm works because the pupil region must be black regardless of

    the iris color. Thus pupil center and region can be still extracted. This change will not

    cause significant affect on gaze estimation since gaze estimation algorithm is based on

    the pupil center.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 62 of 81-

    Figure 5.20: Pupil fitting with different iris color

    5.5 Performance of head poses estimation

    Two approaches are proposed in earlier part to perform head pose estimation, there

    are pros and cons in both approaches. The result is illustrated and discussed in details

    in this section.

    5.5.1 Pose estimation results using POSIT

    Head pose is recovered by using POSIT algorithm, in general the resulting 3D rotation

    matrix and the translation vector are correct but not in a very precise manner. In

    addition, the ASM tracking is not quite stable resulting constantly frustration in the head

    poses estimation result. There are always +/- 10 degree errors in the estimated result.

    Figure 5.21: Head pose estimation result

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 63 of 81-

    5.5.2 Pose estimation results using LK Optical flow

    This approach estimate3D head pose using POSIT together with LK Optical flow and

    RANSAC. The estimation result is very stable and accurate since it does not depend on

    the ASM face fitting result. However, the feature points are easily affected by moving an

    object across the face or rapid head movement.

    Figure 5.22: The rotational angles (roll, yaw and pitch) is calculated using the head pose estimation result

    5.5.3Compare of using two approaches

    The first approach is only using POSIT algorithm while the second approach is using

    POSIT together with LK Optical flow and RANSAC. The tracking accuracy and stability

    of the second approach are ahead of the first approach. However, our system is

    adopted the first approach as head pose estimation algorithm because of the following

    factors:

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 64 of 81-

    1. The second approach shows a better estimation result, but it is computational

    costly than the first approach. It only operates at half speed of the first approach

    (showing in the following figure). Since our gaze tracking system is operating in

    real-time, thus second approach is not affordable to be used.

    2. In the second approach, the tracking deleted by using RANSAC is not recovered

    automatically while the first approach does not have this problem.

    3. Neither other distractions (e.g. moving hand) nor large head movement is

    allowed in the tracking process in the second approach since LK Optical flaw will

    lose track of the feature points.

    Figure 5.23: Comparison of two head pose estimation approaches

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 65 of 81-

    Figure 5.24: A crossing hand on the face

    5.6 Performance of whole system

    In this project, a real-time gaze tracking library with 3D head poses estimation is

    developed. The performance of each steps are critical for the final output. In this section

    performance of the whole system will be tested.

    5.6.1 Speed

    The system is tested on two computer setting environments mention in section 5.2. All

    components are evaluated as a whole. Our gaze tracking system can achieve an

    overall average nearly 8 fps in the desktop testing environment and average 5 fps in the

    laptop testing environment. The result shows that the frame rate will be higher with

    increasing processor speeds.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 66 of 81-

    Figure 5.25: Measuring speed in frame per second during tracking (nearly 8 fps

    achieved)

    5.6.2 Tracking with off-the-shelf equipment

    The gaze tracking library is tested against various off-the-shelf monitors and low cost

    webcams. The webcams used were listed in following figure. All of them are not

    calibrated before use.

    Figure 5.26: The webcams used in the system development and testing

    5.6.3 Gaze tracking accuracy

    The final step reaches to estimate the eye-gaze in our system. Gaze estimation

    algorithm highly depends on the performance of the face tracking result and eye

    features extraction result. In our case, since the ASM fitting is not very stable, all

    landmarks’ position in the shape model shift slightly frame to frame. Thus the result in

    eye features extraction is slightly affected. As a result, the gaze estimation result

    appears to have a slight pulsating movement. The overall position of the estimation is

    correct dispirit of the pulsating movement errors. The following figure shows the result

    of a user focus in the nine positions same as in the system calibration.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 67 of 81-

    Figure 5.27: Gaze mapping for focusing on the nine calibration points’ position

    (Green color cross indicating the left eye gaze, while blue color cross is indicating the right eye gaze)

    5.6 Case study: GazePad

    GazePad is designed as a test bed application to test the possibilities for using eye-

    gaze as an input control. GazePad is a simple application built on top of our gaze

    tracking library. GazePad acquires gaze information from the gaze tracking library, the

    gaze screen coordinates is then mapped on characters pad in order perform letter

    input. GazePad is intuitive to use, learning and training are not required for using this

    application.

    Figure 5.28: GazePad operating environment

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 68 of 81-

    5.6.1 Motivations

    Consider a person's entire body is paralyzed (including mouth, facial movements, etc.)

    but thinking and language processes remain intact. The person's brain is literally

    locked-into their barely functioning body. How do they communicate with the outside

    world? Traditionally, there are some alternative communication methods that can be

    used. Some of them are listed below:

    1. Using a simple blinking system, like blink once for "yes" and twice for "no".

    2. Using a more complex Morse code blinking system.

    3. With a vocal communication partner together with the simple blinking system, the

    communication partner keeps constantly saying "Is it an A? Is it a B?" etc. Blink

    once for "yes" and twice for "no".

    4. Using an alphabet card board with a vocal communication partner, similar to the

    above approach, partner keeps saying "Is the letter in the 1st row? Is the letter in

    the 2nd row?" etc.

    5. Some paralyzed people who still have free head movement can use a head

    mounted stick for typing (shown in the following figure).

    6. The most convenient method is using gaze tracking control like Prof. Stephen

    Hawking.

    Figure 5.29: A disable patient using a head-mount

    stick for typing (www.skymyworlds.com, 2009)

    Figure 5.30: Prof. Hawking with his gaze control

    (Wikipedia, 2009)

    All above methods are very slowly and inconvenient excepting the gaze control method.

    There are strong indications that gaze tracking technique has potential to become an

    important component in human machine interfaces. Ironically, although there are some

    successful rolled out commercial gaze trackers in the market. However, those systems

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 69 of 81-

    and equipment are super expensive, which are not affordable by most of the people.

    High cost and special designed components (no other alternative choice) are the main

    reasons that keeping apart the people who have actually needs. If the price of gaze

    communication systems can be dropped to some range, gaze control could become a

    preferred means of control for a large group of people (Jordansen et al,2005).

    A gaze typing system called “GazePad” is developed based on low cost off-the-shelf

    components that can be bought in most consumer hardware stores. Whish helping

    those people lives become easier and meaningful.

    The main user groups are people with motor neuron disease (MND)(e.g. whole body

    paralyzed) and amyotrophic lateral sclerosis (ALS).

    5.6.2 Interface Design Concepts

    Since using gaze control cannot achieve as accurate as using the mouse when pointing

    to a particular object (e.g. a small button). Thus a conventional on screen keyboard

    layout cannot be adopted because of all buttons are closely packed and small.

    Thereby, some alternative input systems using in touch screen mobile phones were

    studied since they are not demanding on high accuracy. For example, Q9 Chinese and

    Q9 English are divided based on a 3x3 matrix.

    Figure 5.31: QCode Chinese input system (www.qcode.com, 2009)

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 70 of 81-

    Figure 5.32: Character board for paralyzed people communicate

    (Univ. of Washington, 2009)

    After tested the accuracy of our developed gaze tracking library and case study of other

    alternative imputed methods, it is found that the button size is larger the better. Thus,

    we have designed to divide the whole screen into 3x3 matrix similar to QCode inputting

    method. The letters are placing on the whole screen like the character board. Our

    design also combining the concept of QWERTY keyboard, grouping letter with their

    usage frequency. They are ranked based on the usage frequency in order to ensure

    prompt and speedy input.

    Figure 5.33: QWERTY keyboard (www.computerhope.com, 2009)

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 71 of 81-

    5.6.3 Operation

    When the user wants to input a particular character, it is required to focus on the cell

    which contains that character for some period of time (two seconds in our case) to enter

    the sub single character selecting page. By using this selection method, every latter can

    be typed in two steps.

    Figure 5.34: GazePad operating screen

    For an example, I want to type a letter “s”, and then I focused on the first row second

    column, the single character selecting page is entered after 2 seconds. After that, I

    focus on the letter “s” in the second row third column for 2 seconds. After the letter “s” is

    typed.

    Figure 5.35: Letter selection process

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 72 of 81-

    Apart from letter input, symbol and number input are also supported in GazePad.

    Similar to using a mobile phone, change the inputting mode in order to type symbols

    and numbers.

    Figure 5.36: Different inputting model is supported

    5.6.3 Performances

    In the typing experiments, a string contains character and symbol “cityu computer

    science. jack ^.^” is inputted using GazePad. The record time used is about 3.3 minutes

    which means that on average 9 characters can be inputted per minute. In addition,

    every letter can be selected in two steps.

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 73 of 81-

    Chapter 6

    Conclusions

    Critical reviews

    Future work

    Application area

    Project feedback

  • A low Cost Implementation of Real-time Gaze Tracking Framework Final Report

    - Page 74 of 81-

    6.1 Critical reviews

    In this project, a gaze tracking library with facial feature tracking, eye features extraction

    and head poses estimation was developed. The facial feature tracking is based on

    active shape model fitting, which is a model-based face tracking algorithm. The active

    shape model was trained by numerous face images of different individual manually

    annotated with 68 feature points. The shape result from facials model fitting is then

    being used to extract two eyes searching region individually. The extraction of detailed

    infor