Post on 06-Apr-2018
8/2/2019 Eye Blink Full Doc
1/45
Eye Blinks
Abstract
This graduation project aims to present an application that is able of
replacing the traditional mouse with the human face as a new way to interact
with the computer. Facial features (nose tip and eyes) are detected and
tracked in real-time to use their actions as mouse events. The coordinates
and movement of the nose tip in the live video feed are translated to
become the coordinates and movement of the mouse pointer on the users
screen. The left/right eye blinks fire left/right mouse click events. The only
external device that the user needs is a webcam that feeds the program with
the video stream. In the past few years high technology has become more
progressed, and less expensive. With the availability of high speed
processors and inexpensive webcams, more and more people have become
interested in real-time applications that involve image processing. One of the
promising fields in artificial intelligence is HCI(Human Computer Interface.)
which aims to use human features (e.g. face, hands) to interact with the
computer. One way to achieve that is to capture the desired feature with a
webcam and monitor its action in order to translate it to some events that
communicate with the computer.
In our work we were trying to compensate people who have hands
disabilities that prevent them from using the mouse by designing an
application that uses facial features (nose tip and eyes) to interact with the
Computer. The nose tip was selected as the pointing device; the reason
behind that decision is the location and shape of the nose; as it is located in
the middle of the face it is more comfortable to use it as the feature that
moves the mouse pointer and defines its coordinates, not to mention that it
is located on the axis that the face rotates about, so it basically does not
8/2/2019 Eye Blink Full Doc
2/45
change its distinctive convex shape which makes it easier to track as the
face moves. Eyes were used to simulate mouse clicks, so the user can fire
their events as he blinks.
EXISTING SYSTEM
While different devices were used in HCI (e.g. infrared cameras,
sensors, microphones) we used an off-the-shelf webcam that affords a
moderate resolution and frame rate as the capturing device in order to make
the ability of using the program affordable for all individuals.
PROPOSED SYSTEM
To present an algorithm that distinguishes true eye blinks from Involuntary
ones, detects and tracks the desired facial features precisely, and fast
enough to be applied in real-time.
SYSTEM SPECIFICATION
Operating Systems: XP sp2 , 2003
Pentium P4 Processors or better
1 GB of RAM [Required]
JDK 1.5 or more.
JMF 2.x version and above
30 frames supportable web camera.
MODULES
8/2/2019 Eye Blink Full Doc
3/45
Facial features (nose tip and eyes) are detected and tracked in real-
time to use their actions as mouse events.
Nose tip movements are lively fed and translated to become the
movement of mouse pointer or cursor.
The left/right eye blinks replaces left/right mouse click events.
Face Detection
In this module, we propose a real-time face detection algorithm using Six-
Segmented Rectangular (SSR) filter, distance information, and template matching
technique. Between-the-Eyes is selected as face representative in our detection
because its characteristic is common to most people and is easily seen for a wide
range of face orientation. Firstly, we scan a certain size of rectangle divided into six
segments throughout the face image. Then their bright-dark relations are tested if its
center can be a candidate of Between-the-Eyes. Next, the distance information
obtained from stereo camera and template matching is applied to detect the true
Between-the-Eyes among candidates. We implement this system on PC with Xeon
2.2 GHZ. The system can run at real-time speed of 30 frames/sec with detection rate
of 92%.
The current evolution of computer technologies has enhanced various applications
in human-computer interface. Face and gesture recognition is a part of this field,
which can be applied in various applications such as in robotic, security system,
drivers monitor, and video coding system.
Since human face is a dynamic object and has a high degree of variability, various
techniques have been proposed previously. Based on the survey of Hjelmas [1], he
has classified face detection techniques into two categories: featurebased approach
8/2/2019 Eye Blink Full Doc
4/45
and image-based approach. The techniques in the first category makes used of
apparent properties of face such as face geometry, skin color, and motion. Even
feature-based technique can achieve high speed in face detection, but it also has
problem in poor reliability under lighting condition. For second category, theimagebased approach takes advantage of current advance in pattern recognition
theory. Most of the imagebased approach applies a window scanning technique for
detecting face [1], which requires large computation. Therefore, by using only
imagebased approach is not suitable enough in real-time application.
In order to achieve high speed and reliable face detection system, we propose the
method combine both feature-based and image-based approach to detect the point
between the eyes (hereafter we call it Between-the-Eyes) by using Six-Segmented
Rectangular filter (SSR filter). The proposed SSR filter, which is the rectangle
divided into 6 segments, operates by using the concept of bright-dark relation
around Between-the-Eyes area. We select Between-the-Eyes as face representative
because it is common to most people and easy to find for wide range of face
orientation [2]. Between-the-Eyes have dark part (eyes and eyebrows) on both sides,
and have comparably bright part on upper side (forehead), and lower side (nose andcheekbone). This characteristic is stable for any facial expression [2].
In this paper, we use an intermediate representation of image called integral image
from Viola and Jones work [3] to calculate sums of pixel values in each segment of
SSR filter. Firstly, SSR filter is scanned on the image and the average gray level of
each segment is calculated from integral image. Then, the bright-dark relations
between each segment are tested to see whether its center can be a candidate point
for Between -the- Eyes. Next, the stereo camera is used to find the distance
information and the suitable Between-the- Eyes template size. Then, the Between-
the-Eyes candidates are evaluated by using a template of Between-the-Eyes
(obtained from 400 images of 40 people from ORL face database [4]) matching
technique. Finally the true Between-the-Eyes can be detected.
8/2/2019 Eye Blink Full Doc
5/45
The proposed technique gains advantage of using only the gray level information so
it is more reliable for changes of lighting conditions. Moreover, this method is also
not affected by beards, mustaches, hair, or nostril visibility, since only the
information around eyes, eyebrows and nose area is required. We implement thissystem on PC with Xeon 2.2 GHz CPU. The system can run at 30 frames/sec with
detection rate of 92%.
In Section 2, we describe the concept of integral image followed by the explanation
of using SSR filter to extract Between-the-Eyes candidates in Section 3. For Section
4, we explain the candidate selection method by using stereo camera and average
Between-the-Eyes template matching technique. Then in Section 5, the whole
system of real-time face detection system is shown. The experimental results are
shown in Section 6 and end up with conclusion in Section 7.
Integral Image
The SSR filter is computed by using intermediate representation for image called
integral image. For the original image i(x, y), the integral image is defined as [3]
The integral image can be computed in one pass over the original image by the
following pair of recurrences.
s(x ,y ) = s(x , y - 1) + i(x ,y) (2)
ii(x ,y ) = ii(x - 1, y) + s(x ,y ) (3)
Wheres(x ,y ) is the cumulative row sum,s(x , -1) = 0, and ii(-1, y) = 0.
Using the integral image, the sum of pixels within rectangle D (rs) can be computed
at high speed with four array references as shown in Fig.1.
8/2/2019 Eye Blink Full Doc
6/45
sr= (ii (x ,y ) + ii(x - W, y - L)) - (ii (x - W, y) + ii(x, y - L )) (4)
Figure 1. Integral Image
SSR filter
1 SSR filter
At the beginning, a rectangle is scanned throughout the input image. This rectangle
is segmented into six segments as shown in Fig.2 (a).
Figure 2. SSR Filter
8/2/2019 Eye Blink Full Doc
7/45
We denote the total sum of pixel value of each segment (B1 B6) as 1 6 b b S S .
The proposed SSR filter is used to detect the Between-the-Eyes based on two
characteristics of face geometry.
(1) The nose area ( n S ) is brighter than the right and left eye area ( er S and el S ,
respectively) as shown in Fig.2 (b), where
Sn = Sb2 + Sb5
Ser= Sb1 + Sb4
Sel = Sb3 + Sb6
Then,
Sn > Ser (5)
Sn > Sel (6)
(2) The eye area (both eyes and eyebrows) ( e S ) is relatively darker than the
cheekbone area (including nose) ( c S ) as shown in Fig. 2 (c), where
Se = Sb1 + Sb2 + Sb3
Sc = Sb4 + Sb5 + Sb6
Then,
Se < Sc (7)
When expression (5), (6), and (7) are all satisfied, the center of the rectangle can be
a candidate for Between-the-Eyes.
8/2/2019 Eye Blink Full Doc
8/45
Figure 3. Between-the-Eyes candidates from SSR filter
In Fig.3 (b), the Between-the-Eyes candidate area is displayed as the white areas and
the non-candidate area is displayed as the black part. By performing labeling
process on Fig. 3 (b), the result of using SSR filter to detect Between-the-Eyes
candidates is shown in Fig. 3 (a).
2 Filter Size Estimation
In order to find the most suitable filter size, we use 400 facial images of 40 people,
i.e., 10 for each from ORL face database [4]. The images were taken at different
time, under various lighting condition, at different gesture, and with and without
eyeglasses. Each image size is 92112 with 256 gray levels.
We perform filter size estimation manually for all 400 facial images to find thestandard filter size, which covers two eyes, two eyebrows, and cheekbone area
(including nose). The result is estimated to be a rectangle of size 6030 pixels. In
the experiment, we counted whether a true candidate is included or is in vicinity
area. By varying the standard filter size of 6030 by 20%, the true candidate point
detection rate and the number of candidate of each filter size are shown in Table 1.
The standard filter size of 6030 can obtain 92% detection rate, which prove that
this filter size can function effectively. On the other hand, the detection rate
becomes worse (52%), when we use the filter of size 8442 pixels, because large
filter size may include some unnecessary parts of face such as hair or beard. Since
the sum of pixel value is used in expression (5), (6), and (7), the filter of size 2412
8/2/2019 Eye Blink Full Doc
9/45
and 3618 as shown in Fig. 4 can achieve unexpected high detection rate of
Betweenthe- Eyes even these filter sizes do not completely contain both eyes area,
because only some parts of eyes are still darker than nose area.
Fig. 5 is the examples of some successful Between-the-Eyes detections, where some
of failures are shown in Fig. 6. These detection errors may cause by the
illumination. The detection failure of the middle image in Fig. 6 is mainly
influenced by the reflection on the eyeglasses.
Figure 4. Various size of SSR filter
Figure 5. Examples of successful Between-the-Eyes detection
Figure 6. Examples of failures in Between-the-Eyes detection
8/2/2019 Eye Blink Full Doc
10/45
Moreover Fig. 7 is the example of successful Between-the-Eyes detection for image,
which has horizontal illumination hits on one side of face. In this case, SSR filter
can also function effectively even if one side of face is covered by shadow.
Therefore SSR filter can be used to detect Betweenthe- Eyes under variations oflighting condition.
Table 1. Detection results of various SSR filter sizes (from 400 face images)
Figure 7. Example of successful Between-the-Eyes detection for face, which has
illumination hits on one side
According to Table 1, the rectangular of size 0.6~1.2 times of standard size (6030)
can be used to detect candidate of Between-the-Eyes. Therefore various size of face
image from 0.83~1.67 times of the standard image (92112 pixels) can be detected
by our proposed SSR filter.
Candidate Selection
1 Stereo camera
8/2/2019 Eye Blink Full Doc
11/45
In real situation, face size varies according to the distance from face to cameras. We
use two cameras to construct a binocular stereo system to find the distance
information, so that a suitable size of Between-the-Eyes template can be estimated
for further template matching technique discussed later in Section 4.2. Since thestereo camera system is the general process, the detail explanation is omitted in this
paper.
We performed the experiments to find the suitable size of Between-the-Eyes
template by using the difference among right and left images based on the principle
of binocular stereo camera system. Firstly, we measured the horizontal different in
pixel between the Between-the-Eyes of face image obtained from right and left
cameras manually. Then the width between the right and left temples is manually
measured, which should be corresponded to the width of the template of Between-
the-Eyes.
The relation between disparities and suitable templates sizes of Between-the-Eyes is
shown in Fig.8. Based on this relation, we can select an appropriate size of the
template according to the measured disparity in an actual scene. This is why our
proposed technique is applicable to faces at various distances between 0.5-3.5 m.
from the cameras.
From experiments and relation in Fig.8, we can find relations between SSR filter
size, disparity, and size of Between-the-Eyes template as shown in Table 2. Only
two filter size: 4020 and 2412 are used since they are flexible enough to detect
face within pre-defined range. For example, face of disparity equal to 20, the SSR
filter of size 4020 is used and the template size of Between-the-Eyes is 4824
pixels. Then the template is scaled to match the average Between-the-Eyes template
size for template matching technique. For the face of disparity outside the range
shown in table 2 is assumed to be undetectable.
8/2/2019 Eye Blink Full Doc
12/45
Figure 8. The relation between the horizontal differences in pixel (disparity) and the
Between-the-Eyes template size
Table 2. Filter size, disparity, and related Between-the-Eyes template size
2 Average Between-the-Eyes Template Matching
Because the SSR filter extracts not only the true Between-the- Eyes but also some
false candidates, so we use the average Between-the-Eyes template matching
technique to solve this problem. The average Between-the-Eyes pattern used in this
paper obtained in the same manner as [2] from 400 face images of 40 people from
ORL face database [4].
Figure 9. Average Between-the-Eyes template and its variance pattern
Fig. 9 is the average Between-the-Eyes template and its variance pattern of size
3216. The gray levels of each sample were normalized to have average gray level
equal to zero and variance equal to one. Then we calculated an average pattern and
its variance at each pixel. Next, the gray level was converted to have the average
level equal to 128 with standard deviation of 64. Then we can get the average
8/2/2019 Eye Blink Full Doc
13/45
pattern as an image. To obtain the variance pattern, each value was multiply by 255.
Both average and variance pattern are symmetry.
To avoid the influence of unbalanced illumination, we evaluate the right and left
part of face separately because lighting condition is likely different between right
and left half of face. Moreover, we also avoid the affect of hair and beard, and
reduce calculation load by discard the top three rows from the calculation. At the
end, the pattern of 1613 pixels (for one side) is used in template matching.
Define the average Between-the-Eyes template and its variance for left side of face
as , and for the right side as t1ij, v1ij (i=0,...,15, j = 3, ...., 15) and t
rij, v
rij (i=0,...,15, j
= 3, ...., 15). trij and t1ij have average value of 128 with standard deviation of 64,
where vrand v1 represent maximum gray level equal to 255.
To evaluate the candidates, we define the Betweenthe- Eyes pattern as pmn
(m=0,...,31, n = 0, ...., 15) . Then right and left half of pmn is re-defined again
separately as prij (i=0,...,15, j = 3, ...., 15) and p1ij (i=0,...,15, j = 3, ...., 15),
respectively, each has been converted to have average value of 128 and standard
deviation of 64.
Then the left mismatching value (Dl) and the right mismatching value (Dr) are
calculated by using the following equation.
Only the candidate with both Dl and Dr less than pre-defined threshold ( D ) is
counted as the true candidate. For the case of more than one candidate has bothD l
8/2/2019 Eye Blink Full Doc
14/45
and Dr less than threshold, the candidate with the smallest mismatch value is judged
as the true Between-the-Eyes candidate.
3 Detection of Eye-Like Points
Since Between-the-Eyes is located in the middle of left and right eye alignment, we
perform detection of both eyes to confirm the location of the true Between-the-Eyes.
When the locations of both eyes are extracted from the selected face area, the
Between-the-Eyes is re-registered as the middle point among them.
We search eyes area from Between-the-Eyes template obtained from Section 4.1.
The eye detection is done in a simple way as a technique used in [5]. In order toavoid the influence of illumination, we perform the right eye and left eye search
independently. Firstly, the rectangular areas on both side of the Between-the-Eyes
candidate where the eyes should be found are extracted. In this paper, for the
selected Between-the-Eyes area of size 3216, we avoid the affect of eyebrows,
hair, and beard by ignore 1 pixel at boarder. Then both eyes areas are assumed to be
at 1214 pixels on each side of face (neglect three pixel in the middle of Between-
the- Eyes template as nose area).
Next, we find the threshold level for each area to binarize the image. The threshold
level is determined when the sum of the number of pixels of all components except
the boarder exceeds a pre-defined value [6] (10 in this paper). In some case, the
eyebrows have almost the same gray level as the eyes. So we select the area within a
certain range of pixels (5~25 pixels) with the lowest position.
To solve the problem in similarity of gray level of eyes and eyebrows, the searching
process using the concept of left and right eye alignment is performed. The range of
this process focuses on the 33 pixels in the middle of both eyes area. Then
condition of the distance between the located eyes ( De) and the angle ( Ae) at
8/2/2019 Eye Blink Full Doc
15/45
Between-the-Eyes candidate are tested using the following expression. Both
expressions are obtained from experiments.
15 < De < 21 (10)
115 < Ae < 180 (11)
Only the candidate with eyes relation satisfies both condition is re-registered as the
true Between-the-Eyes. Otherwise, the Between-the-Eyes and eyes area cannot
determine.
Real-Time Face Detection System
The processing flow of Real-Time face detection system is shown in Fig. 10.
Figure 10. Processing Flow of Real-Time Face Detection
Experiment
We implement the system on PC with Xeon 2.2 GHz CPU. In the experiment, two
commercial NTSC video cameras, multivideo composer, and video capture board
without any special hardware is used. Two NTSC cameras are used to construct a
binocular stereo system. The multi-video composer combines four NTSC video
8/2/2019 Eye Blink Full Doc
16/45
signals into one NTSC signal. We use only two NTSC video signals from multi-
video composer in our experiment. Each video image becomes one half of the
original size. Therefore, the captured image size for each camera is 320240.
However, to avoid the interlaced scanning problem for moving object, we use onlyeven line data. Consequently, the image size is 320120 for each camera. The
resulting horizontal image resolution is double of the vertical one as shown in the
bottom two images in Fig. 11. We keep this non-uniform resolution to obtain as
accurate disparity as possible.
On the other hand, we need a regular image for applying template matching of
Between-the-Eyes. Therefore we reconstruct a smaller image by sub-sampling as
shown in uppermost-left image of Fig.11.
Fig. 11 is the face detection result from the experiment performed in the laboratory
with unspecified background. The uppermost-left image is a monochrome image of
the right camera with only the green component. The Betweenthe Eyes detection is
applied to this (160120) monochrome image. The lower image is the image
obtained from the right camera, and the lowest image is obtained from the left
camera.
Figure 11. Face Detection Result
8/2/2019 Eye Blink Full Doc
17/45
The detection result from SSR filter is shown in the uppermost-right image. The
upper corner is the Between-the Eyes candidate area after cutting and scaling to
match the average matching template. Its binarized image of detected eyes and
eyebrows after eye detection process is displayed below. Anyway, since noinformation in the inclination of face is used in SSR filter, this technique cannot be
used to detect face with inclination larger than 10 . For the case of large reflection
at eyeglasses, our proposed technique also failed to detect the true Between-the Eyes
occasionally. In real implementation, the system can operate at 30 frames/sec, which
achieve real-time processing speed.
We propose a real-time face detection system consists of three major components:
SSR filter, stereo camera system, and average Between-the-Eyes template matching
unit. At the beginning, a SSR filter, in which bright-dark relations of average gray
levels of each segments are tested if its center can be Between-the-Eyes candidate.
At this point, we used integral image proposed by Viola [3] in SSR filter
calculation in order to obtain real-time scanning of filter throughout the image.
Since only gray information is used, our proposed technique is more reliable for
changes of lighting conditions than skin color extraction methods. Next, stereocamera system is performed to find distance information so that the suitable size of
Between-the-Eyes template can be estimated. This technique can be used to reduce
calculation load and to detect faces of different size. Then we performed the average
Between-the-Eyes template matching to select the true candidate, followed by the
detection of both eye areas to verify our detection result. We implemented the
system on PC with Xeon 2.2 GHz. The system ran at 30 frames/sec, which satisfied
realtime processing speed. Anyway our proposed technique still has limitation inface orientation. Further development to solve this problem should be performed
Hough Transform
8/2/2019 Eye Blink Full Doc
18/45
Common Names: Hough transform
Brief Description
The Hough transform is a technique which can be used to isolate features of a
particular shape within an image. Because it requires that the desired features be
specified in some parametric form, the classical Hough transform is most
commonly used for the detection of regular curves such as lines, circles, ellipses,
etc. AgeneralizedHough transform can be employed in applications where a simple
analytic description of a feature(s) is not possible. Due to the computational
complexity of the generalized Hough algorithm, we restrict the main focus of this
discussion to the classical Hough transform. Despite its domain restrictions, the
classical Hough transform (hereafter referred to without the classicalprefix) retains
many applications, as most manufactured parts (and many anatomical parts
investigated in medical imagery) contain feature boundaries which can be described
by regular curves. The main advantage of the Hough transform technique is that it is
tolerant of gaps in feature boundary descriptions and is relatively unaffected by
image noise.
How It Works
8/2/2019 Eye Blink Full Doc
19/45
The Hough technique is particularly useful for computing a global description of a
feature(s) (where the number of solution classes need not be known a priori), given
(possibly noisy) local measurements. The motivating idea behind the Hough
technique for line detection is that each input measurement (e.g. coordinate point)indicates its contribution to a globally consistent solution (e.g. the physical line
which gave rise to that image point).
As a simple example, consider the common problem of fitting a set of line segments
to a set of discrete image points (e.g. pixel locations output from an edge detector).
Figure 1 shows some possible solutions to this problem. Here the lack of a priori
knowledge about the number of desired line segments (and the ambiguity about
what constitutes a line segment) render this problem under-constrained.
Figure 1a) Coordinate points. b) and c) Possible straight line fittings.
We can analytically describe a line segment in a number of forms. However, a
convenient equation for describing a set of lines usesparametric ornormalnotion:
8/2/2019 Eye Blink Full Doc
20/45
where is the length of a normal from the origin to this line and is the orientation
of with respect to the X-axis. (See Figure 2.) For any point on this line, and
are constant.
Figure 2 Parametric description of a straight line.
In an image analysis context, the coordinates of the point(s) of edge segments ( i.e.
) in the image are known and therefore serve as constants in the parametric
line equation, while and are the unknown variables we seek. If we plot the
possible values defined by each , points in cartesian image space map to
curves (i.e. sinusoids) in the polar Hough parameter space. This point-to-curve
transformation is the Hough transformation for straight lines. When viewed in
Hough parameter space, points which are collinear in the cartesian image space
8/2/2019 Eye Blink Full Doc
21/45
become readily apparent as they yield curves which intersect at a common
point.
The transform is implemented by quantizing the Hough parameter space into finite
intervals or accumulator cells. As the algorithm runs, each is transformed
into a discretized curve and the accumulator cells which lie along this curve are
incremented. Resulting peaks in the accumulator array represent strong evidence
that a corresponding straight line exists in the image.
We can use this same procedure to detect other features with analytical descriptions.
For instance, in the case ofcircles, the parametric equation is
where and are the coordinates of the center of the circle and is the radius. In this
case, the computational complexity of the algorithm begins to increase as we now
have three coordinates in the parameter space and a 3-D accumulator. (In general,
the computation and the size of the accumulator array increase polynomially with
the number of parameters. Thus, the basic Hough technique described here is only
practical for simple curves.)
Guidelines for Use
The Hough transform can be used to identify the parameter(s) of a curve which best
fits a set of given edge points. This edge description is commonly obtained from a
feature detecting operator such as the Roberts Cross, Sobel orCanny edge detector
and may be noisy, i.e. it may contain multiple edge fragments corresponding to a
single whole feature. Furthermore, as the output of an edge detector defines only
http://homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htm8/2/2019 Eye Blink Full Doc
22/45
where features are in an image, the work of the Hough transform is to determine
both whatthe features are (i.e. to detect the feature(s) for which it has a parametric
(or other) description) and how many of them exist in the image.
In order to illustrate the Hough transform in detail, we begin with the simple image
of two occluding rectangles,
The Canny edge detectorcan produce a set of boundary descriptions for this part, as
shown in
Here we see the overall boundaries in the image, but this result tells us nothing
about the identity (and quantity) of feature(s) within this boundary description. In
this case, we can use the Hough (line detecting) transform to detect the eight
separate straight lines segments of this image and thereby identify the true
geometric structure of the subject.
If we use these edge/boundary points as input to the Hough transform, a curve is
generated in polar space for each edge point in cartesian space. The
accumulator array, when viewed as an intensity image, looks like
http://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1can1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1.gif8/2/2019 Eye Blink Full Doc
23/45
Histogram equalizing the image allows us to see the patterns of information
contained in the low intensity pixel values, as shown in
Note that, although and are notionally polar coordinates, the accumulator space is
plotted rectangularly with as the abscissa and as the ordinate. Note that the
accumulator space wraps around at the vertical edge of the image such that, in fact,
there are only 8 real peaks.
Curves generated by collinear points in the gradient image intersect in peaks in
the Hough transform space. These intersection points characterize the straight line
segments of the original image. There are a number of methods which one might
employ to extract these bright points, orlocal maxima, from the accumulator array.
For example, a simple method involves threshold and then applying some thinning
to the isolated clusters of bright spots in the accumulator array image. Here we use a
relative threshold to extract the unique points corresponding to each of the
straight line edges in the original image. (In other words, we take only those local
maxima in the accumulator array whose values are equal to or greater than some
fixed percentage of the global maximum value.)
http://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1hou2.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1hou1.gif8/2/2019 Eye Blink Full Doc
24/45
Mapping back from Hough transform space (i.e.de-Houghing) into cartesian space
yields a set of line descriptions of the image subject. By overlaying this image on an
inverted version of the original, we can confirm the result that the Hough transform
found the 8 true sides of the two rectangles and thus revealed the underlyinggeometry of the occluded scene
Note that the accuracy of alignment of detected and original image lines, which is
obviously not perfect in this simple example, is determined by the quantization of
the accumulator array. (Also note that many of the image edges have several
detected lines. This arises from having several nearby Hough-space peaks with
similar line parameter values. Techniques exist for controlling this effect, but were
not used here to illustrate the output of the standard Hough transform.)
Note also that the lines generated by the Hough transform are infinite in length. If
we wish to identify the actual line segments which generated the transform
parameters, further image analysis is required in order to see which portions of these
infinitely long lines actually have points on them.
To illustrate the Hough technique's robustness to noise, the Canny edge description
has been corrupted by 1% salt and pepper noise
before Hough transforming it. The result, plotted in Hough space, is
http://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1hou3.gif8/2/2019 Eye Blink Full Doc
25/45
De-Houghing this result (and overlaying it on the original) yields
(As in the above case, the relative threshold is 40%.)
The sensitivity of the Hough transform to gaps in the feature boundary can be
investigated by transforming the image
, which has been edited using apaint program. The Hough representation is
and the de-Houghed image (using a relative threshold of 40%) is
http://homepages.inf.ed.ac.uk/rbf/HIPR2/imagedit.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1hou7.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1hou6.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1can3.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1hou5.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sqr1hou4.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/imagedit.htm8/2/2019 Eye Blink Full Doc
26/45
In this case, because the accumulator space did not receive as many entries as in
previous examples, only 7 peaks were found, but these are all structurally relevant
lines.
We will now show some examples with natural imagery. In the first case, we have a
city scene where the buildings are obstructed in fog,
If we want to find the true edges of the buildings, an edge detector ( e.g. Canny)
cannot recover this information very well, as shown in
However, the Hough transform can detect some of the straight lines representing
building edges within the obstructed region. The histogram equalized accumulator
space representation of the original image is shown in
http://homepages.inf.ed.ac.uk/rbf/HIPR2/canny.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/histeq.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sff1hou1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sff1can1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sff1sca1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/canny.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/histeq.htm8/2/2019 Eye Blink Full Doc
27/45
If we set the relative threshold to 70%, we get the following de-Houghed image
Only a few of the long edges are detected here, and there is a lot of duplication
where many lines or edge fragments are nearly colinear. Applying a more generous
relative threshold, i.e. 50%, yields
yields more of the expected lines, but at the expense of many spurious lines arising
from the many colinear edge fragments.
Our final example comes from a remote sensing application. Here we would like to
detect the streets in the image
of a reasonably rectangular city sector. We can edge detect the image using the
Canny edge detectoras shown in
http://homepages.inf.ed.ac.uk/rbf/HIPR2/canny.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/urb1can1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/urb1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sff1hou3.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/sff1hou2.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/canny.htm8/2/2019 Eye Blink Full Doc
28/45
However, street information is not available as output of the edge detector alone.
The image
shows that the Hough line detector is able to recover some of this information.
Because the contrast in the original image is poor, a limited set of features ( i.e.
streets) is identified.
Common Variants
Generalized Hough Transform
The generalized Hough transform is used when the shape of the feature that we wish
to isolate does not have a simple analytic equation describing its boundary. In this
case, instead of using a parametric equation of the curve, we use a look-up table to
define the relationship between the boundary positions and orientations and the
Hough parameters. (The look-up table values must be computed during apreliminary phase using a prototype shape.)
For example, suppose that we know the shape and orientation of the desired feature.
(See Figure 3.) We can specify an arbitrary reference point within the
feature, with respect to which the shape (i.e. the distance and angle of normal lines
drawn from the boundary to this reference point ) of the feature is defined. Our
look-up table (i.e.R-table) will consist of these distance and direction pairs, indexed
by the orientation of the boundary.
http://homepages.inf.ed.ac.uk/rbf/HIPR2/images/urb1hou1.gif8/2/2019 Eye Blink Full Doc
29/45
Figure 3 Description of R-table components.
The Hough transform space is now defined in terms of the possible positions of the
shape in the image, i.e. the possible ranges of . In other words, the
transformation is defined by:
(The and values are derived from the R-table for particular known orientations
.) If the orientation of the desired feature is unknown, this procedure is complicated
by the fact that we must extend the accumulator by incorporating an extra parameter
to account for changes in orientation.
Interactive Experimentation
You can interactively experiment with this operator by clicking here.
Exercises
1. Find the Hough line transform of the objects shown in Figure 4.
http://homepages.inf.ed.ac.uk/rbf/HIPR2/houghdemo.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/houghdemo.htm8/2/2019 Eye Blink Full Doc
30/45
Figure 4 Features to input to the Hough transform line detector.
2. Starting from the basic image
create a series of images with which you can investigate the ability of the
Hough line detector to extract occluded features. For example, begin using
translation and image addition to create an image containing the original
image overlapped by a translated copy of that image. Next, use edge
detection to obtain a boundary description of your subject. Finally, apply the
Hough algorithm to recover the geometries of the occluded features.
3. Investigate the robustness of the Hough algorithm to image noise. Starting
from an edge detected version of the basic image
http://homepages.inf.ed.ac.uk/rbf/HIPR2/translte.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/pixadd.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/edgdetct.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/edgdetct.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/wdg3.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/art5.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/translte.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/pixadd.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/edgdetct.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/edgdetct.htm8/2/2019 Eye Blink Full Doc
31/45
try the following: a) Generate a series of boundary descriptions of the image
using different levels of Gaussian noise. How noisy (i.e. broken) does the
edge description have to be before Hough is unable to detect the original
geometric structure of the scene? b) Corrode the boundary descriptions withdifferent levels ofsalt and pepper noise. At what point does the combination
of broken edges and added intensity spikes render the Hough line detector
useless?
4. Try the Hough transform line detector on the images:
and
Experiment with the Hough circle detector on
http://homepages.inf.ed.ac.uk/rbf/HIPR2/noise.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/noise.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/noise.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/arp1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/pdc1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/pea1.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/noise.htmhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/noise.htm8/2/2019 Eye Blink Full Doc
32/45
and
5. One way of reducing the computation required to perform the Hough
transform is to make use of gradient information which is often available as
output from an edge detector. In the case of the Hough circle detector, the
edge gradient tells us in which direction a circle must lie from a given edge
coordinate point. (See Figure 5.)
http://homepages.inf.ed.ac.uk/rbf/HIPR2/images/tom2.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/rck3.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/cel5.gifhttp://homepages.inf.ed.ac.uk/rbf/HIPR2/images/alg1.gif8/2/2019 Eye Blink Full Doc
33/45
Figure 5 Hough circle detection with gradient information.
a) Describe how you would modify the 3-D circle detector accumulator array
in order to take this information into account. b) To this algorithm we maywant to add gradient magnitude information. Suggest how to introduce
weightedincrementing of the accumulator.
6.The Hough transform can be seen as an efficient implementation of a
generalized matched filter strategy. In other words, if we created a template
composed of a circle of 1's (at a fixed ) and 0's everywhere else in the
image, then we could convolve it with the gradient image to yield an
accumulator array-like description of all the circles of radius in the image.
Show formally that the basic Hough transform (i.e. the algorithm with no use
of gradient direction information) is equivalent to template matching.
7. Explain how to use the generalized Hough transform to detect octagons.
8/2/2019 Eye Blink Full Doc
34/45
Hough transform
The Hough transform is a feature extraction technique used in digital imageprocessing. The classical transform identifies lines in the image, but it has been
extended to identifying positions of arbitrary shapes. The transform universally used
today was invented by Richard Duda and Peter Hart in 1972, who called it a
"generalized Hough transform" after the related patent of Paul Hough. The
transform was popularized in the computer vision community by Dana H. Ballard
through a 1981 journal article titled "Generalizing the Hough transform to detect
arbitrary shapes".
Theory
To extract features from digital images, it is useful to be able to find simple shapes -
straight lines, circles, ellipses and the like - in images. In order to achieve this goal,
one must be able to detect a group of pixel that are on a straight line or a smoothcurve. That is what Hough transform supposed to do.
The simplest case of Hough transform is Hough linear transform. To illustrate the
idea, let's start with a straight line. In the image space, the straight in can be
described as y = mx + b and is plotted for each pair of values (x,y). However, the
charactistics of that straight line is not x or y, but its slope m and intercept b. Based
on that fact, the straight line y = mx + b can be represented as a point (b, m) in the
parameter space (b vs. m graph.)
Using slope-intercept parameters could make application complicated since both
parameters are unboundedAs lines get more and more vertical, the magnitudes of m
and b grow towards infinity For computational purposes, however, it is better to
8/2/2019 Eye Blink Full Doc
35/45
parameterize the lines in the Hough transform with two other parameters, commonly
called rand (theta). The parameterrrepresents the smallest distance between the
line and the origin, while is the angle of the locus vector from the origin to this
closest point Using this parametrization, the equation of the line can be written as:
r=x.cos +y.sin
It is therefore possible to associate to each line of the image, a couple (r,) which isunique if and , or if and . The (r,) plane is
sometimes referred to as Hough space. This representation makes the Hough
transform to be conceptually very close to the so-called Radon transform.
It is well known that an infinite number of lines can go through a single point of the
plane. If that point has coordinates (x0,y0) in the image plane, all the lines that go
through it obey the following equation:
r() =x0.cos +y0.sin
This corresponds to a sinusoidal curve in the (r,) plane, which is unique to that
point. If the curves corresponding to two points are superimposed, the location (in
http://en.wikipedia.org/wiki/Image:Rho-theta.jpg8/2/2019 Eye Blink Full Doc
36/45
theHough space) where they cross correspond to lines (in the original image space)
that pass through both points. More generally, a set of points that form a straight line
will produce sinusoids which cross at the parameters for that line. Thus, the problem
of detecting colinear points can be converted to the problem of finding concurrentcurves.
Implementation
Hough transform algorithm uses an array called accumulator to detect the existence
of a line y = mx + b. The dimension of the accumulator is equal to the number of
unknown parameters of Hough transform problem. For example, the Hough linear
transform problem has two unknown parameters: m and b. The two demension of
the accumulator array would correspond to quantized values for m and b. For each
pixel and its neighbhood, Hough transform algorithm determines if there is enough
evidence of an edge at that pixel. If so, it will caculate the paramaters of that line,
and then look for the accumulator's bin that the parameters fall into, and increase the
value of that bin. By finding the bins with the highest value, the most likely lines
can be extracted, and their (approximate) geometric definitions read off. The
simplest way of finding these peaks is by applying some form of threshold, but
different techniques may yield better results in different circumstances - determining
which lines are found as well as how many. Since the lines returned do not contain
any length information, it is often next necessary to find which parts of the image
match up with which lines.
Example
Consider three data points, shown here as black dots.
8/2/2019 Eye Blink Full Doc
37/45
For each data point, a number of lines are plotted
going through it, all at different angles. These are
shown here as solid lines.
For each solid line a line is plotted which is
perpendicular to it and which intersects the origin.
These are shown as dashed lines.
The length and angle of each dashed line is measured.
In the diagram above, the results are shown in tables.
This is repeated for each data point.
A graph of length against angle, known as a Hough
space graph, is then created.
http://en.wikipedia.org/wiki/Image:Hough_transform_diagram.png8/2/2019 Eye Blink Full Doc
38/45
The point where the lines intersect gives a distance and angle. This distance and
angle indicate the line which bisects the points being tested. In the graph shown the
lines intersect at the purple point; this corresponds to the solid purple line in the
diagrams above, which bisects the three points.
The following is a different example showing the results of a Hough transform on a
raster image containing two thick lines.
http://en.wikipedia.org/wiki/Image:Hough_space_plot_example.png8/2/2019 Eye Blink Full Doc
39/45
The results of this transform were stored in a matrix. Cell value represents the
number of curves through any point. Higher cell values are rendered brighter. The
two distinctly bright spots are the intersections of the curves of the two lines. Fromthese spots' positions, angle and distance from image center of the two lines in the
input image can be determined.
Variations and extensions
Using the gradient direction to reduce the number of votes
An improvement suggested by O'Gorman and Clowes can be used to detect lines ifone takes into account that the local gradient of the image intensity will necessarily
be orthogonal to the edge. Since edge detection generally involves computing the
intensity gradient magnitude, the gradient direction is often found as a side effect. If
a given point of coordinates (x,y) happens to indeed be on a line, then the local
http://en.wikipedia.org/wiki/Image:Hough-example-result-en.png8/2/2019 Eye Blink Full Doc
40/45
direction of the gradient gives the parameter corresponding to said line, and the r
parameter is then immediately obtained. In fact, the real gradient direction is only
estimated with a given amount of accuracy (approximately 20), which means that
the sinusoid must be traced around the estimated angle, 20. This however reducesthe computation time and has the interesting effect of reducing the number of
useless votes, thus enhancing the visibility of the spikes corresponding to real lines
in the image.
Hough transform of curves, and Generalised Hough transform
Although the version of the transform described above applies only to finding
straight lines, a similar transform can be used for finding any shape which can be
represented by a set of parameters. A circle, for instance, can be transformed into a
set of three parameters, representing its center and radius, so that the Hough space
becomes three dimensional. Arbitrary ellipses and curves can also be found this
way, as can any shape easily expressed as a set of parameters. For more complicated
shapes, the Generalised Hough transform is used, which allows a feature to vote for
a particular position, orientation and/or scaling of the shape using a predefined look-
up table.
Using weighted features
One common variation detail. That is, finding the bins with the highest count in one
stage can be used to constrain the range of values searched in the next.
Limitations
The Hough Transform is only efficient if a high number of votes fall in the right bin,
so that the bin can be easily detected amid the background noise. This means that
the bin must not be too small, or else some votes will fall in the neighboring bins,
thus reducing the visibility of the main bin.
8/2/2019 Eye Blink Full Doc
41/45
Also, when the number of parameters is large (that is, when we are using the
Generalised Hough Transform with typically more than three parameters), the
average number of votes cast in a single bin is very low, and those bins indeed
corresponding to a figure in the image not necessarily appear to have a much highernumber of votes than the neighbors. Thus, the Generalised Hough Transform must
be used with great care to detect anything other than lines or circles.
Finally, much of the efficiency of the Hough Transform is dependent on the quality
of the input data: the edges must be detected well for the Hough Transform to be
efficient. Use of the Hough Transform on noisy images is a very delicate matter and
generally, a denoising stage must be used before. In the case where the image is
corrupted by speckle, as is the case in radar images, the Radon transform is
sometimes preferred to detect lines, since it has the nice effect of attenuating the
noise through summation.
What is eye tracking? (References)
Is there an easier way for the disabled to communicate? How does a 6-month-old
baby perceive the world? Where is the most effective ad space on a website?
Eye tracking can be used to find answers to questions like these, as well as many
others by measuring a persons point of gaze (i.e. where they are looking) and
determining eye/head position.
The origins of eye tracking are over a century old, but in the last 5 years large
technological advances have opened up new possibilities. Modern day eye tracking
can be used not only in a laboratory, but in homes, schools, and businesses where it
aids in research and analysis and is used for interacting with computers as well as
with friends and family.
Simple Idea, Complex Math
8/2/2019 Eye Blink Full Doc
42/45
Eye tracking works by reflecting invisible infrared light onto an eye, recording the
reflection pattern with a sensor system, and then calculating the exact point of gaze
using a geometrical model. Once the point of gaze is determined, it can be
visualized and shown on a computer monitor. The point of gaze can also be used tocontrol and interface with different machines. This technique is referred to as eye
control.
Improving the experience
The main challenges of eye tracking are not only in developing the right algorithms
and sensor solutions, which are a prerequisite for a high level of accuracy, but also
in the way users interact with a specific eye tracking device. Eye trackers should be
able to perform with all types of eyes and account for such things as glasses, contact
lenses, head movement and light conditions. Users should also be able to save
personal settings and even look away from the eye tracker without needing to
recalibrate.
Until recently, different types of eyes required different methods of eye tracking.
Dark pupil tracking worked better for people with dark eyes and bright pupil
tracking worked better for children and people with blue eyes. Recently, both of
these techniques have been combined to eliminate the need for two separate eye
trackers.
Another important aspect in eye tracking is the track box. This is the imaginary
box in which a user can move his/her head and still be tracked by the device. With a
larger track box, the user will have more freedom of movement and experiencegreater comfort.
8/2/2019 Eye Blink Full Doc
43/45
Multiple Applications
With the right idea there is no limit to the applications of eye tracking. Currently,
some of the major uses for analysis are academic research e.g. cognitive science,
psychology and medical research; market research and usability studies, such as
evaluations of advertising or package design and software or web usability.
Eye tracking techniques can also be used for interaction - people can control a
computer and make things happen by just looking at it. Eye control can be used as
sole interaction technique or combined with keyboard, mouse, physical buttons and
voice.
Eye control is used in communication devices for disabled persons and in various
industrial and medical applications.
Future Value
The crude, complex, and highly intrusive eye tracking techniques of the past have
been replaced by refined and user-friendly methods that are producing valuable
results today and paving the way for the future. Eye tracking and eye control have a
limitless future. Areas like personal computing, the automotive industry, medical
research, and education will soon be utilizing eye tracking in ways never thought
possible.
Eye Tracking technology
8/2/2019 Eye Blink Full Doc
44/45
Tobiis eye tracking technology utilizes advanced image processing of a persons
face, eyes and reflections in the eyes of near-infrared reference lights to accurately
estimate:
the 3D position in space of each eye
the precise target to which each eye gaze is directed towards
Key advantages
Tobii has taken eye tracking technology a significant step forward through a number
of key innovations that enable large market applications. Key advantages of Tobiis
eye tracking technology are:
Fully automatic eye tracking
High tracking accuracy
Ability to track nearly all people
Completely non-intrusive
Good tolerance of head-motion
Patented techniques
Compared to other technologies, a number of innovations have been made to
overcome traditional problems associated with eye tracking, such as cumbersome
equipment, poor tracking precision and limited tolerance to head motion. Some of
the key aspects of Tobiis technology include:
Patented techniques to use fixed wide field of view optics in combination
with high resolution sensors
Patented techniques for accurate estimation of the 3D position in space for
both eyes
8/2/2019 Eye Blink Full Doc
45/45
Sophisticated image processing and patented control logic to allow for 100%
automatic tracking, and high tracking ability; tracks almost everyone, even
those with glasses
Advanced algorithms to compensate for head motion without loss inaccuracy
Unique techniques to enable long-lasting calibrations
Application technology
Tobii conducts research and development into eye tracking applications. We have
developed an extensive toolbox of software that allows us to rapidly create eye
control applications and eye gaze analysis applications.
Tobiis eye-based interaction technology includes the Tobii eye control engine, a
powerful ActiveX-based API for rapid creation of eye control applications in the
Windows environment. This allows our customers and partners to quickly develop
and customize applications to utilize eye gaze as a modality in computer interfaces.
This is not yet on the market, but is available to key partners on a projectbasis.