An investigation of the distribution of gaze estimation ... · tups. The simulation code for...

Journal of Eye Movement Research11(3):5, 1-14

An investigation of the distribution of gaze estimation errors in headmounted gaze trackers using polynomial functions

Diako MardanbegiDepartment of Management Engineering, Technical University of Denmark, Denmark

Andrew T. N. KurauchiDepartment of Computer Science (IME), University of Sao

Paulo, Sao Paulo, Brazil

Carlos H. MorimotoDepartment of Computer Science (IME), University of Sao

Paulo, Sao Paulo, Brazil

Second order polynomials are commonly used for estimating the point-of-gaze in head-mounted eye trackers. Studies in remote (desktop) eye trackers show that although some non-standard 3rd order polynomial models could provide better accuracy, high-order polynomialsdo not necessarily provide better results. Different than remote setups though, where gaze isestimated over a relatively narrow field-of-view surface (e.g. less than 30 × 20 degrees ontypical computer displays), head-mounted gaze trackers (HMGT) are often desired to cover arelatively wider field-of-view to make sure that the gaze is detected in the scene image evenfor extreme eye angles. In this paper we investigate the behavior of the gaze estimation errordistribution throughout the image of the scene camera when using polynomial functions. Us-ing simulated scenarios, we describe effects of four different sources of error: interpolation,extrapolation, parallax, and radial distortion. We show that the use of third order polynomialsresult in more accurate gaze estimates in HMGT, and that the use of wide angle lenses mightbe beneficial in terms of error reduction.

Keywords: eye tracking, gaze estimation, head-mounted eye tracking, polynomial map-ping, error distribution

Introduction

Monocular video-based head mounted gaze trackers useat least one camera to capture the eye image and another tocapture the field-of-view (FoV) of the user. Probably due tothe simplicity of regression-based methods when comparedto model-based methods (Hansen & Ji, 2010), regression-based methods are commonly used in head-mounted gazetrackers (HMGT) to estimate the user’s gaze as a point withinthe scene image, despite the fact that such methods do notachieve the same accuracy levels of model-based methods.

In this paper we define and investigate four differentsources of error to help us characterize the low performanceof regression-based methods in HMGT. The first source oferror is the inaccuracy of the gaze mapping function in in-terpolating the gaze point (eint) within the calibration box,

History: Received January 3, 2018; Published June 30, 2018.Citation: Mardanbegi, D., Kurauchi, A. T. N. & Morimoto, C. H. (2018). Aninvestigation of the distribution of gaze estimation errors in head mounted gazetrackers using polynomial functions. Journal of Eye Movement Research, 11(3):5,1-14.Digital Object Identifier: 10.16910/jemr.11.3.5ISSN: 1995-8692This article is licensed under a https://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International license.

the second source is the limitation of the mapping functionto extrapolate the results outside the calibration box requiredin HMGT (eext), the third is the misalignment between thescene camera and the eye known as parallax error (epar), andthe fourth error source is the radial distortion in the sceneimage when using a wide angle lens (edis).

Most of these sources of error have been investigated be-fore independently. Cerrolaza et al. (Cerrolaza, Villanueva,& Cabeza, 2012) have studied the performance, based on theinterpolation error, of different polynomial functions usingcombinations of eye features in remote eye trackers. Mar-danbegi and Hansen (Mardanbegi & Hansen, 2012) have de-scribed the parallax error in HMGTs using epipolar geom-etry in a stereo camera setup. They have investigated howthe pattern of the parallax error changes for different cam-era configurations and calibration distances. However, noexperimental result was presented in their work showing theactual error in a HMGT. Barz et al. (Barz, Daiber, & Bulling,2016) have proposed a method for modeling and predictingthe gaze estimation error in HMGT. As part of their study,they have empirically investigated the effect of extrapolationand parallax error independently. In this paper, we describethe nature of the four sources of error introduced above inmore details providing a better understanding of how these

1

https://creativecommons.org/licenses/by/4.0/https://creativecommons.org/licenses/by/4.0/


Mardanbegi, D., Kurauchi, A.T.N., & Morimoto, C.H. (2018)distribution of gaze estimation errors in head mounted gaze trackers

different components contribute to the gaze estimation errorin the scene image. The rest of the paper is organized asfollows: The simulation methodology used in this study isdescribed in the first section and the next section describes re-lated work regarding the use of regression-based methods forgaze estimation in HMGT. We then propose alternative poly-nomial models and compare them with the existing models.We also show how precision and accuracy of different poly-nomial models change in different areas of the scene image.Section Parallax Error describes the parallax error in HMGTand its following section investigates the effect of radial dis-tortion in the scene image on gaze estimation accuracy. Thecombination of errors caused by different factors is discussedin Section Combined Error and we conclude in Section Con-clusion.

Simulation

All the results presented in the paper are based on sim-ulation and the proposed methods are not tested on real se-tups. The simulation code for head-mounted gaze trackingthat was used in this paper was developed based on the eyetracking simulation framework proposed by (Böhme, Dorr,Graw, Martinetz, & Barth, 2008).

The main four components of a head-mounted eye tracker(eye globe, eye camera, scene camera and light source) aremodeled in the simulation. After defining the relationshipbetween these components, points can be projected from 3Dto the camera images, and vice versa. Positions of the rele-vant features in the eye image are computed directly basedon the geometry between the components (eye, camera andlight) and no 3D rendering algorithms and image analysis areused in the simulation. Pupil center in the eye image is ob-tained by projecting the center of pupil into the image and noellipse fitting is used for the tests in this paper. The eyeballcan be oriented in 3D either by defining its rotation angles orby defining a fixation point in space. Fovea displacement andlight refraction on the surface of the cornea are considered inthe eye model.

The details of the parameters used in the simulation aredescribed in each subsequent section.

Regression-based methods in HMGT

The pupil center (PC) is a common eye feature used forgaze estimation (Hansen & Ji, 2010). Geometry-based gazeestimation methods (Guestrin & Eizenman, 2006; Model &Eizenman, 2010) mostly rely on calculating the 3D positionof the pupil center as a point along the optical axis of the eye.Feature-based gaze estimation methods, on the other hand,directly use the image of the pupil center (its 2D location inthe eye image) as input for their mapping function.

Infrared light sources are frequently used to create corneal

Figure 1. Sagittal view of a HMGT

reflections, or glints, that are used as reference points. Whencombined, the pupil-center and glint (first Purkinje image(Merchant, Morrissette, & Porterfield, 1974)) forms a vec-tor (in the eye image) that can be used for gaze estimationinstead of the pupil-center alone. In remote eye trackers,the use of the pupil-glint vector (PCR) improves the perfor-mance of the gaze tracker for small head motions (Morimoto& Mimica, 2005). However, eye movements towards the pe-riphery of the FoV are often not tolerated when using glintsas the reflections tend to fall off the corneal surface. For thesake of simplicity, in the following, we use pupil center in-stead of PCR as the eye feature used for gaze mapping.

Figure 1 illustrates the general setup for a pupil-basedHMGT consisting of 3 components: the eye, the eye camera,and the scene camera. Gaze estimation essentially maps theposition of the pupil center in the eye image (px) to a pointin the scene image (x) when the eye is looking at a point (X)in 3D.

Interpolation-based (regression-based) methods have beenwidely used for gaze estimation in both commercial eyetrackers and research prototypes in remote (or desktop) sce-narios (Cerrolaza, Villanueva, & Cabeza, 2008; Cerrolaza etal., 2012; Ramanauskas, Daunys, & Dervinis, 2008). Com-pared to geometry-based methods (Hansen & Ji, 2010), theyare in general more sensitive to head movements though theypresent reasonable accuracy around the calibration position,they do not require any calibrated hardware (e.g. camera cal-ibration, and predefined geometry for the setup), and theirsoftware is simpler to implement. Interpolation-based meth-ods use linear or non-linear mapping functions (usually a firstor second order polynomial). The unknown coefficients ofthe mapping function are fitted by regression based on cor-

2



respondence data collected during a calibration procedure.It is desirable to have a small number of calibration pointsto simplify the calibration procedure, so a small number ofunknown coefficients is desirable for the mapping function.

In a remote gaze tracker (RGT) system, one may assumethat the useful range of gaze directions is limited to thecomputer display. Performance of regression-based methodsthat map eye features to a point in a computer display havebeen well studied for RGT (Sesma-Sanchez, Villanueva, &Cabeza, 2012; Cerrolaza et al., 2012; Blignaut & Wium,2013). Cerrolaza et al. (Cerrolaza et al., 2012) present anextensive study on how different polynomial functions per-form on remote setups. The maximum range of eye rotationused in their study was about (16◦ × 12◦) (looking at a 17inches display at the distance 58 cm). Blignaut (Blignaut,2014) showed that a third order polynomial model with 8coefficients for S x and 7 coefficients for S y provides a goodaccuracy (about 0.5◦) on a remote setup when using 14 ormore calibration points.

However, performance of interpolation-based methods forHMGT have not yet been thoroughly studied. The mappingfunction used in a HMGT maps the eye features extractedfrom the eye image to a 2D point in the scene image that iscaptured by a front view camera (scene camera) (Majaranta& Bulling, 2014). For HMGT it is common to use a wideFoV scene camera (FoV > 60◦) so gaze can be observed overa considerably larger region than RGT. Nonetheless, HMGTsare often calibrated for only a narrow range of gaze direc-tions. Because gaze must be estimated over the whole regioncovered by the scene camera, the polynomial function mustextrapolate the gaze estimate outside the bounding box thatcontains the points used for calibration (called the calibrationbox). To study the behavior of the error inside and outside thecalibration box, we will refer to the error inside the box as in-terpolation error and outside as extrapolation error. The useof wide FoV lenses also increases radial distortions whichaffect the quality of the scene image.

On the other hand, if the gaze tracker is calibrated for awide FoV that spans over the whole scene image, it will in-crease the risk of poor interpolation. This has to do withthe significant non-linearity that we get in the domain of theregression function (due to the spherical shape of the eye)for extreme viewing angles. Besides the interpolation andextrapolation errors, we should take into account the polyno-mial function is adjusted for a particular calibration distancewhile in practice the distance might vary significantly duringthe use of the HMGT.

Derivation of alternative polynomial models

To find a proper polynomial function for HMGTs and tosee whether the commonly used polynomial model is suit-able for HMGTs, we will use a systematic approach similar

to the one proposed by Blignaut (Blignaut, 2014) for RGTs.The systematic approach consists of considering each depen-dent variable S x and S y (horizontal and vertical componentsof the gaze position on the scene image) separately. We firstfix the value for the independent variable Py (vertical com-ponent of the eye feature - in our case, pupil center or PCR -on the eye image) and vary the value of Px (horizontal com-ponent of the eye feature on the eye image) to find the re-lationship between S x and Px. Then the process is repeatedfixing Px and varying Py to find the relationship between co-efficients of the polynomial model and Py.

Table 1Default eye measures used in the simulation

r_cornea 7.98 mmHorizontal fovea offset (α) 6◦

Vertical fovea offset (β) 2◦

Table 2Default configuration for the cameras and the light sourceused in the simulation. All measures are relative to the worldcoordinate system with the origin at the center of the eyeball(CE) (see Figure 1). The symbols R and Tr stands for rotationand translation respectively.

Scene camera

FoV = H : 65◦ × V : 40◦R = (pan, tilt, yaw) = (0, 0, 0)Tr = (10 mm, 30 mm, 35 mm)no radial distortionres=(1280 × 768)

Eye camera

focal length: providing an eye image withWeyeWimg

= 90% where Weye is the horizontaldimension of the eye area in the image andWimg is the image width

R: satisfying the assumption of camera beingtowards eyeball centerTr = (0 mm,−10 mm, 60 mm)res=(1280 × 960)

Light source Tr = (0, 0, 60 mm)

We simulated a HMGT with a scene camera described inTable 2. A grid of 25 × 25 points in the scene image (thewhole image covered) are back-projected to fixation pointson a plane at 1 m away from CE and the corresponding pupilposition is obtained for each point. We run the simulationfor 9 different eyes defined by combining 3 different valuesfor each of the parameters shown in Table 1 (3 parametersand ±25% of their default values). We extract the samplesfor two different conditions, one with pupil center and thesecond condition with pupil-glint vector as our independentvariable.

Figure 2 shows a virtual eye socket and the pupil centercoordinates corresponding to 625 (grid of 25 × 25) target

3



0 200 400 600 800 1000 1200

200

300

400

500

600

700

pupil center

eye corners

First level of Py

Figure 2. Virtual eye socket showing 625 pupil centers.Each center corresponds to an eye orientation that points theoptical-axis of the eye towards a scene target on a plane 1m from the eye, and each point on the plane corresponds toan evenly distributed 25 × 25 grid point in the scene camera.Samples were split into 7 groups based on their Py valuesby discretizing the Y axis. Samples in the middle group areshown in a different color.

points in the scene image for one eye model. Let X and Yaxis correspond to the horizontal and vertical axis of the eyecamera respectively. To express S x in terms of Px we needto make sure the other variable Py is kept constant. However,we have no control on the pupil center coordinates and eventaking a specific value for S y in the target space (as it wassuggested in (Blignaut, 2014)) will not result in a constantPy value. Thus, we split the sample points along the Y axisinto 7 groups based on their Py values by discretizing the Yaxis. 7 groups give us enough samples in each group that aredistributed over the X axis. This grouping makes it possibleto select only the samples that have a (relatively) constant Py.

By keeping the independent variable Py within a specificrange (e.g., from pixel 153 to 170, which roughly corre-sponds to the gaze points at middle of the scene image), wecan write about 88 relationships for S x in terms of Px.

Figure 3 shows this relationship which suggests the use ofa third order polynomial with the following general form:

X = a0 + a1x + a2x2 + a3x3 (1)

We then look at the effect of changing the independentvariable Py on coefficients ai. To keep the distribution ofsamples across the X axis uniform when changing the Pylevel, we skip the first level of Py (Figure 2). The changesof ai against 6 levels of Py are shown in Figure 4. Fromthe figure we can see that relationship between coefficientsai and the Y coordinate of the pupil center is best representedby a second order polynomial:

ai = ai0 + ai1y + ai2y2 (2)

The general form of the polynomial function for S x is thenobtained by substituting these relationships into (Eq.1) which

400 500 600 700 800 900

Px

0

200

400

600

800

1000

1200

1400

Sx

Figure 3. Relationship between the input Px (pupilx) andoutput (S x). Different curves show the result for differentparameters in the eye models.

will be a third order polynomial with 12 terms:

1, x, y, xy, x2, y2, xy2, yx2, x2y2, x3, x3y, x3y2 (3)

300 350 400 450 500 550

Py

-2e-05

-1.95e-05

-1.9e-05

-1.85e-05

-1.8e-05

-1.75e-05

-1.7e-05

-1.65e-05

-1.6e-05

-1.55e-05

a0

(a)

300 350 400 450 500 550

Py

0.029

0.03

0.031

0.032

0.033

0.034

0.035

0.036

0.037

0.038

a1

(b)

300 350 400 450 500 550

Py

-27

-26

-25

-24

-23

-22

-21

a2

(c)

300 350 400 450 500 550

Py

6000

6200

6400

6600

6800

7000

7200

7400

a3

(d)

Figure 4. Relationship between the coefficients ai of the re-gression function S x against Y coordinate of the pupil center.

We follow a similar approach to obtain the polynomialfunction for S y. Figure 5a shows the relationship betweenS y and the independent variable Py from which it can be in-ferred that a straight line should fit the samples for 27 dif-ferent eye conditions. Based on this assumption we look atthe relationship between the two coefficients of the quadraticfunction and Px. The result is shown in Figure 5b& 5c which

4



suggests that both coefficients could be approximated by sec-ond order polynomials resulting that S y to be a function withthe following terms:

1, y, y2, x, xy, xy2 (4)

To determine the coefficients for S x at least 12 calibrationpoints are required, while S y only requires 6. In practice thepolynomial functions for S x and S y are determined from thesame data. As at least 12 calibration points will already becollected for S x, a more complex function could be used forS y. In the evaluation section we show results using the samepolynomial function (Eq.3) for both S x and S y. However, tobetter characterize the simulation results we first introducethe concept of interpolation and extrapolation regions in thescene image.

250 300 350 400 450 500 550

Py

0

100

200

300

400

500

600

700

800

Sy

(a)

400 450 500 550 600 650 700 750 800

Px

2.9

3

3.1

3.2

3.3

3.4

3.5

3.6

a0

(b)

400 450 500 550 600 650 700 750 800

Px

-1100

-1050

-1000

-950

-900

-850

-800

a1

(c)Figure 5. (5a) Relationship between the regression functionS y against the Y coordinate of the pupil center. (5b& 5c)Relationship between the coefficients ai of S y against Px

Interpolation and extrapolation regions

Gaze mapping calibration is done by taking correspondingsample points from the range and the domain. This is usuallydone by asking the user to look at a set of co-planar points ata fixed distance (a.k.a calibration plane). For each point, thecorresponding position in the scene image and the pupil po-sition in the eye image are stored. Any gaze point inside the

Figure 6. Sagittal view of a HMGT

bounding box of the calibration pattern (the calibration box)will be interpolated by the polynomial function. If a gazepoint is outside the calibration box it will be extrapolated.This is illustrated in Figure 6, where TcBc is the area in thecalibration plane (πcal) that is visible in the scene image. LetCL1 and CL2 be the edges of the calibration pattern. Anygaze position in πcal within the range from Tc to CL1 or fromCL2 to Bc will be extrapolated by the polynomial function.These two regions in the calibration plane are marked in redin the figure. We can therefore divide the scene image intotwo regions depending on whether the gaze point is interpo-lated (calibration box) or extrapolated (out of the calibrationbox).

In order to be able to express the relative coverage of thesetwo regions on the scene image, we use a measure similar tothe one suggested by (Barz et al., 2016). We define S int asthe ratio between the interpolation area and the total sceneimage area:

S int =Aint

Aimage(5)

We also refer to S int as the interpolation ratio in the image.

Gaze estimation error when changing fixation depth

From now on, we refer to any fixation point in 3D by itsdistance from the eye along the Z axis. Therefore, we definefixation plane as the plane that includes the fixation point andis parallel to the calibration plane. T f B f in Figure 6 shows

5



Table 3S int at different fixation distances for two different calibrationdistances

dcal=0.6 m dcal=3 m0.6 m 48.8% 47%1 m 45.3% 48.8%3 m 42% 49.9%5 m 41% 49.3%

the part of the fixation plane that is visible in the image. Wecan see that the interpolated (green) and extrapolated (red)regions in the scene image would change when the fixationplane π f ix diverges from the calibration plane. Projecting thered segment on the fixation plane π f ix into the scene imagewill define a larger extrapolated area in the image. Accord-ingly, the interpolated region in the image gets smaller whenthe fixation plane goes further away. Therefore, the inter-polation ratio that we get for the calibration plane (S calint ) isnot necessarily equal to the interpolation ratio that we havefor different depths. Not only the size of the interpolationarea changes when changing the fixation depth, but also theposition of the interpolation region changes in the image.

Figure 6 illustrates a significant change in the value of S intfor a small variation of fixation distance which happens atvery close distances to the eye. We simulate a HMGT withthe simplified eye model (described in Table 1) and a typi-cal scene camera configuration described in Table 2 to seewhether changes of S int are significant in practice. The resultis shown in Table 3 for different fixation distances on a gazetracker calibrated at distances 0.6 m and 3.0 m. We assumethat the calibration pattern covers about 50% of the image(S calint = 0.5).

The amount of change in the expansion of the interpola-tion region depends on the configuration of the camera andthe epipole location in the scene image which is described byepipolar geometry (see Section Parallax Error). However, theresult shows that for an ordinary camera setup, these changesare not significant.

Practical grid size and distance for calibration

There are different ways to carry out calibration inHMGTs. The common way is to ask the user to look atdifferent targets located at a certain distance from the eye(calibration distance) and recording sample points from theeye and scene images while user is fixating on each target.Target points in the scene image could be either marked andpicked manually by clicking on the image (direct pointing)or it could be detected automatically (indirect pointing) us-ing computer-vision-based methods. The targets are usuallymarkers printed out on papers and attached to a wall or aredisplayed on a big screen (or projectors) in front of the userduring calibration.

Alternatively, targets could be projected by a laser diode(Babcock & Pelz, 2004) allowing the calibration pattern tocover a wider range of the field of view of the scene camera.However, the practical size (angular expansion) for the cal-ibration grid is limited to a certain range of the FoV of theeye. The further the calibration plane is from the subject thesmaller the angular expansion of the calibration grid will be.Calibration distance for HMGTs is usually less than 3 m inpractice, and the size is smaller than 50◦ horizontally and 30◦

vertically and it will not be convenient for the user to fixateon targets that have larger viewing angles. The other thingthat affects the size is the hardware components that clut-ter user’s view (e.g. eye camera and goggles’ frame). Withthese considerations, it is very unlikely that a calibration pat-tern covers the entire scene image, thus S calint is usually lessthan 40% when using a lens with a field of view larger than70◦ × 50◦ on the scene camera. Whereas, the calibration gridusually covers more than 80% of the computer display in aremote eye tracking setup.

The number of calibration points is another important fac-tor to consider. Manually selecting the calibration targets inthe image slows down the calibration procedure and it couldalso affect the calibration result due to the possible head (andtherefore camera) movements during the calibration. There-fore, to minimize the calibration time and accuracy, HMGTswith manual calibration often use no more than 9 calibrationpoints. However, detecting the targets automatically allowsfor collecting more points in an equivalent amount of timewhen the user looks at a set of target points in the calibrationplane or by following a moving target. Thus the practicalnumber of points for calibration really depends on the cali-bration method. It might for example be worth to collect 12or 16 points instead of 9 points if this improves the accuracysignificantly.

Evaluation of different polynomial functions

The performance of the polynomial functions derived ear-lier are compared to an extention of the second order poly-nomial model suggested by(Mitsugami, Ukita, & Kidode,2003) and with two models suggested by (Blignaut, 2013)and (Blignaut, 2014). These models are summarized in Table4.

Model 5 is similar to model 4 except that it uses Eq. 3for both S x and S y. The scene camera was configured withthe properties from Table 2. The 4×4 calibration grid waspositioned 1 m from the eye and 16×16 points uniformly dis-tributed on the scene image were used for testing.

We tested the five polynomial models using 2 interpola-tion ratios (20% and 50%). Besides the 4×4 calibration grid,we used a 3×3 calibration grid for polynomial model 1.

The gaze estimation result for these configurations areshown in Figure 7 for the interpolation and extrapolation re-

6



Table 4Summary of models tested in the simulation. Functions are shown with only their terms without coefficients.

No. reference S x S y1 Blignaut, 2014 1, x, y, xy, x2, y2, x2y2 1, x, y, xy, x2, y2, x2y2

2 Blignaut, 2013 1, x, y, xy, x2, x2y2, x3, x3y 1, x, y, xy, x2, y2, x2y3 Blignaut, 2014 1, x, y, xy, x2, y2, x2y, x3, y3, x3y 1, x, y, xy, x2, x2y

4 Derived above1, x, y, xy, x2, y2, x2y,xy2, x2y2, x3, x3y, x3y2 1, x, y, xy, y

2, xy2

5 Derived above1, x, y, xy, x2, y2, x2y,xy2, x2y2, x3, x3y, x3y2

1, x, y, xy, x2, y2, x2y,xy2, x2y2, x3, x3y, x3y2

1 (16 pnts) 1 (9 pnts) 2 (16 pnts) 3 (16 pnts) 4 (16 pnts) 5 (16 pnts)

Regression model

0

0.5

1

1.5

2

Err

or

[deg]

(a) Interp (pupil, 20%)


Regression model

0

2

4

6

8

10

Err

or

[deg]

(b) Extrap (pupil, 20%)


Regression model

0

0.5

1

1.5

2

Err

or

[deg]

(c) Interp (pupil, 50%)


Regression model

0

2

4

6

8

10

Err

or

[deg]

(d) Extrap (pupil, 50%)Figure 7. Gaze estimation error obtained from different regression models for interpolation and extrapolation regions of thescene image. Gaze estimation was based on the Pupil center and no measurement noise was applied to the eye image. Errorsare measured in degrees.

gions. Each boxplot shows the gaze error in a particular re-gion measured in degrees. These figures are only meant togive an idea of how different gaze estimation functions per-form. The result shows that there is no significant differencebetween models 3 and 4 in the interpolation area. Increasingthe calibration ratio increases the error in the interpolation re-gion but overall gives a better accuracy for the whole image.For this test, no significant difference was observed betweenthe models 3, 4 and 5.

Similar test was performed with pupil-corneal-reflection(PCR) instead of pupil. The result for PCR condition isshown in Figure 8. The result shows that model 5 with PCRoverperforms other models when calibration ratio is greaterthan 20% even though the model was derived based on pupilposition only.

To have a more realistic comparison between differentmodels, in Section Combined Error we look at the effect ofnoise in the gaze estimation result by applying a measure-ment error on the eye image.

Parallax Error

Assuming that the mapping function returns a precise gazepoint all over the scene image, the estimated gaze point willstill not correspond to the actual gaze point when it is not onthe calibration plane. We refer to this error as parallax errorwhich is due to the misalignment between the eye and thescene camera.

Figure 9, illustrates a head-mounted gaze tracking setupin 2D (sagittal view). It shows the offset between the actualgaze point in the image x2 and the estimated gaze point x1when the gaze tracker is calibrated for plane πcal and eye isfixating on the point X2cal. The figure is not to scale and forthe sake of clarity the calibration and fixation planes (respec-tively πcal and π f ix) are placed very close to the eye. Here,the eye and scene cameras can both be considered as pinholecameras forming a stereo-vision setup.

We define the parallax error as the vector between theactual gaze point and the estimated gaze point in the sceneimage (epar(x2) =

−−−→x2x1) when the mapping function works

precisely.

7




Regression model

0

0.5

1

1.5

2

Err

or

[deg]

(a) Interp (pcr, 20%)


Regression model

0

2

4

6

8

10

Err

or

[deg]

(b) Extrap (pcr, 20%)


Regression model

0

0.5

1

1.5

2

Err

or

[deg]

(c) Interp (pcr, 50%)


Regression model

0

2

4

6

8

10

Err

or

[deg]

(d) Extrap (pcr, 50%)Figure 8. Gaze estimation error obtained from different regression models for interpolation and extrapolation regions of thescene image. Gaze estimation was based on the PCR feature and no measurement noise was applied to the eye image. Errorsare measured in degrees.

Figure 9. Sagittal view of a HMGT illustrating the epipolargeometry of the eye and the scene camera.

When the eye fixates at points along the same gaze direc-tion, there will be no change in the eye image and conse-quently the estimated gaze point in the scene image remainsthe same. As a result, when the point of gaze (X2 f ix) movesalong the same gaze direction the origin of the error vectorepar moves in the image, while the endpoint of the vectorremains fixed.

The parallax error epar for any point x in the scene im-

age can be geometrically derived by first back-projecting thedesired point onto the fixation plane (point X f ix):

X f ix =

XxXyd f = P+x (6)

Where P+ is the pseudo-inverse of the projection matrix Pof the scene camera.

And then, intersecting the gaze vector for X f ix with πcal:

Xcal =dcd f

X f ix (7)

Where dc is the distance from the center of the eyeballto the calibration plane and d f is the distance to the fixationplane along the Z axis. Finally, projecting the point Xcal ontothe scene camera gives us the end-point of the vector eparwhile the initial point x in the image is actually the start-pointof the vector.

By ignoring the visual axis deviation and taking the op-tical axis of the eye as the gaze direction, the epipole e inthe scene image can be defined by projecting the center ofeyeball CE onto the scene image. According to epipolar ge-ometry this can be described as:

e = Kt = K3×3[−ECR

T ·EC Tr]3×1

(8)

Where K is the eye camera matrix and ECRT and ECTr

are respectively rotation and translation of the scene cam-era related to center of the eyeball. Mardanbegi andHansen (Mardanbegi & Hansen, 2012) have shown that tak-ing the visual axis deviation into account does not make asignificant difference in the location of epipole in the sceneimage.

8



Figure 10. parallax error in the scene image for fixation dis-tance at 3 m when dcal = 1 m on the setup described in Ta-ble2. This figure assumes an ideal mapping function withzero interpolation and extrapolation error in the entire imagefor the calibration distance dcal.

Figure 10 shows an example distribution of the parallaxerror in the scene image for dcal = 1 m and d f ix = 3m onthe setup described in Table 2 when having an ideal mappingfunction with zero error for the calibration distance in theentire image.

Effect of radial lens distortion

In this section we show how radial distortion in the sceneimage, that is more noticeable when using wide-angle lenses,affects the gaze estimation accuracy in HMGT.

Figure 2 shows the location of pupil centers in the eyeimage when the eye fixates at points that are uniformly dis-tributed in the scene image. These pupil-centers are obtainedby back projecting the corresponding target point in the sceneimage onto the calibration plane, and rotating the eye opti-cal axis towards that fixation point in the scene. When thescene image has no radial distortion, the back-projection ofthe scene image onto the calibration plane is shaped as aquadrilateral (dotted line in Figure 11).

However, when the scene image is strongly affected byradial distortion, the back-projection of the scene image ontothe calibration plane is shaped as a quadrilateral with a pin-cushion distortion effect (dashed line in Figure 11). Figure13 shows the corresponding pupil positions for these fixationpoints. By comparing Figure 13 with Figure 2, we can seethat the positive radial distortion in the pattern of fixationtargets caused by lens distortion, to some extent will com-pensate for the non-linearity of the pupil positions and addsa positive radial distortion to the normal eye samples.

To see whether this could potentially improve the result ofthe regression we compared 2 different conditions one withand the other without lens distortion. We want to compare

Figure 11. Calibration grid (small circles) and working area(red rectangle) marked in the calibration plane and borders ofthe scene image when it is back-projected onto the calibra-tion plane with (dashed line) and without (dotted line) lensdistortion. This figure was drawn according to the settingsdescribed in Table 5.

Figure 12. A sample image with radial distortion showing thecalibration region (gray) and the working area (red curve).

0 200 400 600 800 1000 1200

200

300

400

500

600

700

pupil center

eye corners

Figure 13. A sample eye image with pupil centers corre-sponding to 625 target points in the scene image when havinga lens distortion.

9



Table 5Parameters used in the simulation for testing the effect of lensdistortion

wide-angle lens

FoV = H : 90◦ × V : 60◦R = (pan, tilt, yaw) = (0, 0, 0)Tr = (10 mm, 30 mm, 35 mm)focal length=965 pixelsdistortion coefficients=[−0.42, 0.17,−0.00124, 0.0015,−0.034]res=(1280 × 768)

calibrationFoV = H : 30◦ × V : 25◦calibration distance=1 m

working area FoV = H : 50◦ × V : 30◦

different conditions independently of the camera FoV andfocal length. Since adding lens distortion to the projectionalgorithm of the simulation may change the FoV of the cam-era we define a “working area” which corresponds to the re-gion where we want to have gaze estimated on. Also, a fixedcalibration grid in the center of the working area is used forall conditions. Two different polynomial functions are usedfor gaze mapping in both conditions using the pupil center:Model 1 with a calibration grid of 3 × 3 points, and model5 with 4 × 4 calibration points. The test is done with theparameters described in Table 5. Also, lens distortion in thesimulation is modeled with a 6th order polynomial (Weng,Cohen, & Herniou, 1992):{

xdistorted = x(1 + k1r2 + k2r4 + k3r6)ydistorted = y(1 + k1r2 + k2r4 + k3r6)

(9)

Figure 12 shows a sample scene image showing the cali-bration and the working areas conveying the amount of dis-tortion in the image that we get from the lens defined in Table5.

Figure 14 shows a significant improvement in accuracywhen having lens distortion with a second order polynomial.However, lens distortion does not have a huge impact onthe performance of the model 5 (Figure 15) because this3rd order polynomial has already compensated for the non-linearity of the pupil movements.

Besides affecting gaze mapping result, lens distortion alsodistorts the pattern of error vectors in the image. For exam-ple, in a condition where we have parallax error, and no errorfrom the polynomial function, the assumption of having oneepipole in the image at which all epipolar lines intersect doesnot hold when we have lens distortion.

Combined Error

In the previous sections we discussed different factors thatcontribute to the final vector field of gaze estimation errorin the scene image. These four factors do not affect the

gaze estimation independently and we cannot combine theirerrors by simply adding the resultant vector field of errorsobtained from each. For instance, when we have epar andeint vector fields, the final error at point x2 in the sceneimage is not the sum of two epar(x2) and eint(x2) vectors.According to Figure 9, the estimated gaze point is actuallyMap(px1cal ) = x1 + eint(x1) which is the mapping result ofpupil center px1cal that corresponds to the point x1cal on πcal.Thus, the final error at point x2 will be:

e(x2) = epar(x2) + eint(x1) (10)

An example error pattern in Figure 16 illustrates howmuch the parallax error could be deformed when it is com-bined with interpolation and extrapolation errors.

The impact of lens distortion factor is even more com-plicated as it both affects the calibration and causes a non-linear distortion in the error field. Although mathematicallyexpressing the error vector field might be a complex task, wecould still use the simulation software to generate the errorvector field. This could in practice be useful if direction ofthe vectors in the vector field is fully defined by the geome-try of the setup in HMGT. This could help manufacturers toknow about the error distribution for a specific configurationwhich could later be used in the analysis software by weight-ing different areas of the image in terms of gaze estimationvalidity. Therefore, it will be valuable to investigate whetherthe error vector field is consistent and could be defined onlyby knowing the geometry of the HMGT.

The four main factors described in the paper are thosethat resulting from the geometry of different components ofa HMGT system. There are other sources of error that wehave not discussed such as: image resolution of both cam-eras, having noise (measurement error) in pupil tracking,pupil detection method itself, and the position of the lightsource when using pupil and corneal reflection. We have ob-served that noise and inaccuracy in detecting eye features inthe eye image has the most impact in the accuracy of gazeestimation. Applying noise in the eye tracking algorithm inthe simulation allows us to have a more realistic comparisonbetween different gaze estimation functions and also showsus how much the error vectors in the scene image are affectedby inaccuracy in the measurement both in terms of magni-tude and direction. We did the same comparison betweendifferent models that was done in the evaluation section, butthis time with two levels of noise with a Gaussian distribution(mean=0, standard deviation=0.5 and 1.0 pixel).

Figure 18 shows how much the pupil detection in the im-age (1280 × 960) gets affected by noise level 0.5 in the mea-surement. Pupil centers in the eye image corresponding to agrid of 16 × 16 fixation points on the calibration plane, areshown in red for the condition with noise, and blue for thecondition without noise.

Figure 17 shows the gaze estimation result for noise level

10



(a) (b)

Distortion No distortion

0

1

2

3

Err

or

[de

g]

(c)Figure 14. Gaze estimation error in the scene image showing the effect of radial distortion on polynomial function 1 (3 × 3calibration points) (a) with and (b) without lens distortion. The error in the working area for both conditions is shown in (c).

(a) (b)

Distortion No distortion

0.2

0.4

0.6

0.8

Err

or

[de

g]

(c)Figure 15. Gaze estimation error in the scene image showing the effect of radial distortion on polynomial function 5 (4 × 4calibration points) (a) with and (b) without lens distortion. The error in the working area for both conditions is shown in (c).

Figure 16. An example of error pattern in the image whenhaving mapping error and parallax error combined

0.5 with PCR method. No radial distortion was included andthe noise was added both during and after the calibration.The result shows how the overall error gets lower when in-creasing the calibration ratio from 20% to 50%.

To see the impact of noise on the direction of vectors in

the image, a cosine similarity measure is used for compar-ing the two vector fields (each containing 16 × 16 vectors).We compare the vector fields obtained from 2 different noiselevels (0.5 and 1.5) with the vector field obtained from thecondition with no noise. For this test, the calibration andthe fixation distances are respectively set to 0.7m and 3m.Adding parallax error makes the vector field more meaning-ful in the no-noise condition. In this comparison we ignorethe differences in magnitude of the error and only comparethe direction of vectors in the image. Figure 19 shows howmuch, direction of vectors deviates when having measure-ment noise in practice with model 1 and 5.

Based on the results shown in Figure 17 we can concludethat we get almost the same gaze estimation error in the in-terpolation region for all the polynomial functions. Havingtoo much noise, has a great impact on the magnitude of theerror vectors in the extrapolation region and the effect is evengreater in the 3rd order polynomial models. Figure 19 indi-cates that despite the changes in the magnitude of the vectors,when having noise, direction of the vectors does not changesignificantly. This means that the vector field obtained basedon the geometry could be used as a reference for predicting

11




Regression model

0

0.5

1

1.5

2

Err

or

[deg]

(a) Interp (pcr, 20%)


Regression model

0

2

4

6

8

10

Err

or

[deg]

(b) Extrap (pcr, 20%)


Regression model

0

0.5

1

1.5

2

Err

or

[deg]

(c) Interp (pcr, 50%)


Regression model

0

2

4

6

8

10

Err

or

[deg]

(d) Extrap (pcr, 50%)Figure 17. Gaze estimation error obtained from different polynomial models. Gaze estimation was based on the PCR feature.Resolution of the eye image is set to 1280 × 960 and a noise level of 0.5 is applied. Errors are measured in degrees of visualangle.

Figure 18. Pupil centers in the eye image corresponding toa grid of 16 × 16 fixation points on the calibration plane, areshown by red for noisy condition, and blue for without noise.Image resolution is 1280 × 960.

at which parts of the scene image the error is larger (relativeto the other parts) and how the overall pattern of error wouldbe. However this needs to be validated empirically on realHMGT.

Another test was conducted to check the performance ofhigher order polynomial models. The test was done with cal-ibration ratio of 20% and noise level 0.5 using a pupil-onlymethod. A 4 × 4 calibration grid was used for models 1, 2and 5 and a 5 × 5 grid for 4th and 5th order standard poly-nomial models. The gaze estimation result of this compari-son is shown in Figure 20 confirming that performance doesnot improve with higher order polynomial models even with

1(9)

(noise

=0.5

)

1(9)

(noise

=1.5

)

1(16

)(noise

=0.5

)

1(16

)(noise

=1.5

)

5(no

ise=

0.5)

5(no

ise=

1.5)

0

5

10

15

20

25

Co

sin

e s

imila

rity

(va

lue

s m

ap

ped

to

de

gre

es)

Figure 19. This figure shows how much different levels ofmeasurement noise (in model 1 and 5) affects the directionof error vectors when having parallax error. The vertical axisrepresents the angular deviation.

more calibration points.

Conclusion

In this paper we have investigated the error distributionof polynomial functions used for gaze estimation in head-mounted gaze trackers (HMGT). To describe the perfor-mance of the functions we have characterized four differentsources of error. The interpolation error is measured withinthe bounding box defined by the calibration points as seen by

12



1 (16 points) 2 (16 points) 5 (16 points) poly44 (25 points) poly55 (25 points)

Regression model

0

0.5

1

1.5

2

Err

or

[de

g]

(a) Interp (pupil, 20%)

1 (16 points) 2 (16 points) 5 (16 points) poly44 (25 points) poly55 (25 points)

Regression model

0

2

4

6

8

10

Err

or

[de

g]

(b) Extrap (pupil, 20%)Figure 20. Comparing the performance of higher order polynomials (4th and 5th order) with 25 calibration points with 3rdorder polynomial models using 16 points for calibration

the scene camera. The extrapolation error is measured in theremaining area of the scene camera outside the calibrationbounding box. The other two types of error are due to theparallax between the scene camera and the eye, and the radialdistortion of the lens used in the scene camera. Our resultsfrom simulations show that third order polynomials providebetter overall performance than second order and even higherorder polynomial models.

We didn’t find any significant improvement of model 5over model 4, specially when the noise is present in the in-put (comparing figures 17 and 8). This means that it’s notnecessary to use higher order polynomials for S y.

Furthermore, we have shown that using wide angle lensscene cameras actually reduces the error caused by non-linearity of the eye features used for gaze estimation inHMGT. This could improve the results of the second orderpolynomial models significantly as these models suffer morefrom the non-linearity of the input. Although the 3rd orderpolynomials provide more robust results with and withoutlens distortion, the 2nd order models have the advantage ofrequiring fewer calibration points. We replicated the sameanalysis we did for deriving model 4 but with the effect ofradial distortion in the scene image. We found linear rela-tionships between S x and Px and also between S y and Py.The relationship between S and the coefficients were alsolinear suggesting the following model for both S x and S y:

1, x, y, xy (11)

As a future work we would like compare the performanceof the models discussed in this paper on a real head-mountedeye tracking setup and see if the results obtained from thesimulation could be verified. It would also be interesting tocompare the performance of a model based on Eq.11 on awide angle lens with model 4 on a non-distorted image. Thesimulation shows that the gaze estimation accuracy obtainedfrom a model based on Eq.11 with 4 calibration points ona distorted image is as good as the accuracy obtained frommodel 4 with 16 points on a non-distorted image. This, how-ever, needs to be verified on a real eye tracker.

Though an analytical model describing the behavior of theerrors might be feasible, the simulation software developedfor this investigation might help other researchers and manu-facturers to have a better understanding of how the accuracyand precision of the gaze estimates vary over the scene imagefor different configuration scenarios and help them to defineconfigurations (e.g. different cameras, lenses, mapping func-tions, etc) that will be more suitable for their purposes.

References

Babcock, J. S., & Pelz, J. B. (2004). Building a lightweighteyetracking headgear. In Proceedings of the 2004 acmsymposium on eye tracking research and applications(pp. 109–114).

Barz, M., Daiber, F., & Bulling, A. (2016). Prediction ofgaze estimation error for error-aware gaze-based inter-faces. In Proceedings of the ninth biennial acm sym-posium on eye tracking research & applications (pp.275–278). New York, NY, USA: ACM.

Blignaut, P. (2013). A new mapping function to improve theaccuracy of a video-based eye tracker. In Proceedingsof the south african institute for computer scientistsand information technologists conference (pp. 56–59).

Blignaut, P. (2014). Mapping the pupil-glint vector to gazecoordinates in a simple video-based eye tracker. J. EyeMov. Res, 7, 1–11.

Blignaut, P., & Wium, D. (2013). The effect of mappingfunction on the accuracy of a video-based eye tracker.In Proceedings of the 2013 conference on eye trackingsouth africa (pp. 39–46).

Böhme, M., Dorr, M., Graw, M., Martinetz, T., & Barth,E. (2008). A software framework for simulating eyetrackers. In Proceedings of the 2008 symposium on eyetracking research and applications (pp. 251–258).

Cerrolaza, J. J., Villanueva, A., & Cabeza, R. (2008). Tax-onomic study of polynomial regressions applied to thecalibration of video-oculographic systems. In Pro-ceedings of the 2008 symposium on eye tracking re-search and applications (pp. 259–266).

13



Cerrolaza, J. J., Villanueva, A., & Cabeza, R. (2012,July). Study of polynomial mapping functions invideo-oculography eye trackers. ACM Trans. Comput.-Hum. Interact., 19(2), 10:1–10:25.

Guestrin, E. D., & Eizenman, M. (2006). General theoryof remote gaze estimation using the pupil center andcorneal reflections. Biomedical Engineering, IEEETransactions on, 53(6), 1124–1133.

Hansen, D. W., & Ji, Q. (2010). In the eye of the beholder:A survey of models for eyes and gaze. Pattern Anal-ysis and Machine Intelligence, IEEE Transactions on,32(3), 478–500.

Majaranta, P., & Bulling, A. (2014). Eye tracking and eye-based human–computer interaction. In Advances inphysiological computing (pp. 39–65). Springer.

Mardanbegi, D., & Hansen, D. W. (2012). Parallax error inthe monocular head-mounted eye trackers. In Proceed-ings of the 2012 acm conference on ubiquitous com-puting (pp. 689–694). New York, NY, USA: ACM.

Merchant, J., Morrissette, R., & Porterfield, J. L. (1974).Remote measurement of eye direction allowing sub-ject motion over one cubic foot of space. BiomedicalEngineering, IEEE Transactions on(4), 309–317.

Mitsugami, I., Ukita, N., & Kidode, M. (2003). Estimation of3d gazed position using view lines. In Image analysisand processing, 2003. proceedings. 12th internationalconference on (pp. 466–471).

Model, D., & Eizenman, M. (2010, May). An automaticpersonal calibration procedure for advanced gaze es-timation systems. IEEE Transactions on BiomedicalEngineering, 57(5), 1031-1039.

Morimoto, C. H., & Mimica, M. R. (2005). Eye gaze track-ing techniques for interactive applications. ComputerVision and Image Understanding, 98(1), 4–24.

Ramanauskas, N., Daunys, G., & Dervinis, D. (2008). Inves-tigation of calibration techniques in video based eyetracking system. Springer.

Sesma-Sanchez, L., Villanueva, A., & Cabeza, R. (2012).Gaze estimation interpolation methods based onbinocular data. Biomedical Engineering, IEEE Trans-actions on, 59(8), 2235–2243.

Weng, J., Cohen, P., & Herniou, M. (1992, Oct). Cameracalibration with distortion models and accuracy eval-uation. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 14(10), 965-980.

14

IntroductionSimulationRegression-based methods in HMGTDerivation of alternative polynomial modelsInterpolation and extrapolation regionsGaze estimation error when changing fixation depthPractical grid size and distance for calibration

Evaluation of different polynomial functionsParallax ErrorEffect of radial lens distortionCombined ErrorConclusionReferences

An investigation of the distribution of gaze estimation ... · tups. The simulation code for...

Documents

Transcript of An investigation of the distribution of gaze estimation ... · tups. The simulation code for...