_DeBoisset_report.pdf
-
Upload
maneesha-krishnan -
Category
Documents
-
view
217 -
download
0
Transcript of _DeBoisset_report.pdf
-
7/28/2019 _DeBoisset_report.pdf
1/121
HAND GESTURE RECOGNITION
Gradient orientation histograms and
eigenvectors methods
Bertrand de BOISSET
FRAUNHOFER INSTITUT
INSITITUT GRAPHISCHE DATENVERARBEITUNG
Fraunhoferstrae 5
D-64283 Darmstadt
Supervisor:
Didier Stricker
Examiner:
Didier Stricker
-
7/28/2019 _DeBoisset_report.pdf
2/121
1
Declaration
I hereby declare that this dissertation and the work described in it is my own work, except
where otherwise stated, done only with the indicated sources. All the parts, which were
inferred from the sources, are marked as such. It has not been submitted before for any
degree or examination, at any other university.
DARMSTADT, June 15th 2006
Ehrenwortliche Erkl arung
Hiermit versichere ich, die vorliegende Diplomarbeit ohne Hilfe Dritter und nur mit den
angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus den
Quellen entnommen wurden, sind als solche kenntlich gemacht worden. Diese Arbeit hat
in gleicher oder ahnlicher Form noch keiner Pr ufungsbeh orde vorgelegen.
DARMSTADT, 15. Juny 2006
Declaration
Je declare que le rapport realise ainsi que le travail decrit dans ce document est un travail
personnel, sauf contre-indication, realise avec laide des sources citees dans la bibliogra-phie. Toutes les parties qui sont reprises sont indiquees en tant que telles. Ce projet na
jamais ete presentee pour aucune autre examination auparavant dans aucune autre univer-
site.
DARMSTADT, 15 Juin 2006
-
7/28/2019 _DeBoisset_report.pdf
3/121
2
Abstract
The aim of this work is to implement different methods to make gesture recognitions. The
main parts of my work were:
First the analysis of the different ways to realize gesture recognition.
Then to implement the Gradients histogram recognition. This method consists in
calculating gradients in a picture and then construct histograms of gradients orien-
tation.
We also took a closer look on the algebraical analysis of an image, by searching theprincipal components that defines a set of pictures (eigenvectors in the space of the
data set). This second method is called PCA (Principal Component Analysis).
Then, to finish the project, we had to analyze the different methods implemented,
by performing different tests. After that, We could define the best and worst points
of each method. We also realized a small application to illustrate our work.
-
7/28/2019 _DeBoisset_report.pdf
4/121
Acknowledgments
I would like to thank my supervisor, Alain Pagani, for his enthusiasm, help and guidance
throughout this project. I would also like to thank Didier Stricker, who supervised my
work during this period. And, I will not forget:
F. Merienne, C. Pere, M. Moll, H. Wuest,F. Vial... They all helped me to finish this
project in time and gave me some pieces of advice when i needed.
All the members of the Department for Virtual and Augmented Reality (A4) of the
Fraunhofer IGD for providing an interesting and stimulating working environment.
3
-
7/28/2019 _DeBoisset_report.pdf
5/121
Contents
1 Project Aims 6
2 Theory and backgrounds 7
2.1 The database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The simple subtraction method . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The Gradient based method . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 The goal of this method . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Main steps of the method . . . . . . . . . . . . . . . . . . . . . . 9
2.4 The Principal Component Analysis -PCA- method . . . . . . . . . . . . . 10
2.4.1 The goal of this method . . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 Mathematical Backgrounds . . . . . . . . . . . . . . . . . . . . 10
2.4.3 Main steps of the method . . . . . . . . . . . . . . . . . . . . . . 18
3 Implementation and explanation 21
3.1 simple subtraction method . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Realization of the method . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Conclusion on the method . . . . . . . . . . . . . . . . . . . . . 21
3.2 Histograms of oriented gradients method . . . . . . . . . . . . . . . . . . 21
3.2.1 Step 1: Gradient magnitude calculation . . . . . . . . . . . . . . 22
3.2.2 Step 2: Gradient orientation calculation and magnitude threshold . 24
3.2.3 Step 3: Gaussian filter operator . . . . . . . . . . . . . . . . . . 26
3.2.4 Step 4: Euclidian distance comparison . . . . . . . . . . . . . . . 27
3.2.5 Step 5: Establish a comparison matrix . . . . . . . . . . . . . . . 303.2.6 Problems encountered . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.7 Conclusion on the method . . . . . . . . . . . . . . . . . . . . . 39
3.3 PCA or Eigenfaces method . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Step 1: Realize the database . . . . . . . . . . . . . . . . . . . . 40
3.3.2 Step 2: Subtract the mean . . . . . . . . . . . . . . . . . . . . . 42
3.3.3 Step 3: Calculate the covariance matrix . . . . . . . . . . . . . . 42
3.3.4 Step 4: Eigenvectors and eigenvalues calculation, and choose the
good eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4
-
7/28/2019 _DeBoisset_report.pdf
6/121
CONTENTS 5
3.3.5 Step 5: Realize the new Data set and compare . . . . . . . . . . . 43
3.3.6 Conclusion on this method . . . . . . . . . . . . . . . . . . . . . 44
4 Tests, results and analyze 45
4.1 The application: Rock Paper Scissors Game! . . . . . . . . . . . . . . . 45
4.2 Test and choices of the parameters . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Choice of the size of the derivative filter and the number of box
for the gradients method . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2 Choice of the number of pictures and the size of images for the
data set for both methods . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Last tests to explain the efficiency of each method . . . . . . . . . . . . . 60
4.3.1 First tests: Recognition percentage of each method in general . . 614.3.2 Second tests: Recognition percentage of each method in different
conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Conclusion: advantages and drawbacks 74
A Tables of general tests 78
B Tables of specific tests 85
C Script of the game 90
List of figures 118
Bibliography 120
-
7/28/2019 _DeBoisset_report.pdf
7/121
Chapter 1
Project Aims
We can define the goal with a simple question: How could we command some different
applications just by a singular hand gesture? The aim of my final year project is to an-
swer this question by studying different methods that allow you to realize a hand gesture
recognition. Moreover, the recognition has to be done by one camera and in real time, so
that you can operate as fast as you want to.
To begin, we had the idea to realize a simple subtraction between two images, pixel
per pixel to compare them. We will see in the second Chapter the results of that.
Then we studied a method that used Gradients. The aim is to build the orientationhistograms of the different pictures and to compare them. We will take a closer look
on this method in the chapter 3.
After that, we implemented a method called PCA (principal Component Analy-
sis) or Eigenface. The goal is to calculate, find and study the eigenvectors of the
different pictures and then to express each image with its principal components
(eigenvectors). The difficult part was to find a way to compare the images through
the expression of them with the eigenvectors (as it is done in the Eigenface -face
recognition-)
Last we created a small application to illustrate the different working methods.
6
-
7/28/2019 _DeBoisset_report.pdf
8/121
Chapter 2
Theory and backgrounds
Before explaining the theory of the different methods, we will just say the main idea of
the methods.
In fact, we realized a database of different hand gestures and we labeled all the data set
pictures so that each picture is classified. Then, the aim is to compare an unknown image
with an image of the database and identify the label by taking the nearest image label
back.
Therefore, we will see in the first part how we choose our database and how we defined
it.
2.1 The database
In this section, we will take a closer look on the database.
At the beginning, we had two main questions about the creation:
Which hand gestures should we choose?
How many pictures of each gesture should we take?
With these questions we could answer by two different kind of database for the samegestures choose:
Take lots of pictures of the different hand gestures to realize a huge database, so that
the recognition will be better (and it is a way to reduce the limits of the different
methods: we will have much more chances to find an image in the database that
looks like the gesture to analyze ). The problem is that it will be longer to look for
the pictures in the database during the comparison.
7
-
7/28/2019 _DeBoisset_report.pdf
9/121
CHAPTER 2. THEORY AND BACKGROUNDS 8
Take few pictures of the different gestures to realize the database quickly. Then it
will be easier to create the database for the user, and it will be quicker to look for apicture in the database (during the comparison). The problem is that the recognition
will be harder if the gestures are similar.
Therefore, during the project, we begun by taking 5 positions of the hand ( 1, 2, 3, 4
and five fingers) and lots of pictures in the database. We had good results, but the calcu-
lation time was huge. Therefore, we decided to change the database creation by taking a
minimum of pictures of new hand gestures (3 positions that were really different: scissors,
paper and rock).
We will study later the results returned. We will see in details why we change the positions
and the database. What is important now is to understand with which kind of database we
realized this work.
2.2 The simple subtraction method
The aim of this method was to try a simple way to compare images and then to explain and
justify why we had to implement other methods. We will not take a deeper look on this
method here. The theory is really simple: subtract the different images pixel per pixel,
and then compare the results and show the closest one.
We will see the different tests in the last chapter (4) and a short sum up oh this method inthe next chapter about implementation (3).
2.3 The Gradient based method
In order to study the hand gesture recognition, we will study the theory of the gradient
orientation histogram method. In this section, we will take a closer look to define the aim
of this method and the main steps to implement it.
2.3.1 The goal of this method
First of all, the aim of this method is to recognize different hand gestures (without cap-
tors). These hand gestures must be clearly identified in order to command any kind of
applications.
The theory of this method is to study the gradients in the image and to analyze it to realize
an orientation histogram. Then the goal is to compare the histogram to return the label of
the nearest image.
-
7/28/2019 _DeBoisset_report.pdf
10/121
CHAPTER 2. THEORY AND BACKGROUNDS 9
2.3.2 Main steps of the method
In order to implement this method, we begun by reading some different articles on the
subject, which are really interesting and useful to know the directions to go to: [25] and
[24].
We can then split up into its main parts the method:
First of all, we had to implement the gradient magnitude calculation. The aim is
to define where in the picture the biggest gradient magnitudes are. Then, it will be
easy to apply a threshold on the gradients in order to keep the really interesting one
and to cut all the background noise. To realize this part, the theory is to calculate
the magnitude with the formula:
magnitude =
dx2 + dy2
Therefore, we have to calculate the derivative of the image in x and y to have themagnitude. We will have to choose a size for the derivative filter (in any case, we
will choose a circle derivative filter).
Then, we implemented the gradient orientation calculation. The goal is to realize an
histogram cut in 36 bins (each 10 degrees) or more (we will study the influence later
in the chapter 4). To realize this histogram, we will have to calculate the gradient
orientation defined by the formula:
orientation = Arctan(dy/dx)
Therefore, with this formula, it will be possible to know the orientation of the gra-
dients in the image. We can see that for both magnitude and orientation we will
need the derivative of the image.
With this histogram, we can then have a vector of gradient orientations, which is
defining the picture quite good. So, this second step is the part that will allow
us to compare the images between them. It is a way to define the form with an
appropriate precision.
Also, we had to realize a Gaussian filter to blur the image and have an homoge-
neous picture. It will permit to obtain better results in the gradient magnitude and
orientation calculation. The goal of this filter is to erase the background defects. We
can say that for this method, it is really important to have an uniform background
to avoid noise. To make the background more uniform and to erase white pixels,
we realized this filter. It will permit to obtain better results.
We created a gradient magnitude threshold which had to erase the lower levels
gradients in order to keep the really interesting ones. That will cut all the noise
-
7/28/2019 _DeBoisset_report.pdf
11/121
CHAPTER 2. THEORY AND BACKGROUNDS 10
and regularize the background. This part will be complementary with the gaussian
filter. The gaussian filter will blur the big defects (but it will still be there), and thethreshold will cut the lowest magnitudes. Then the noise will be quite well cut.
Then, the next step was to calculate the euclidian distance between the vectors of
the different images analyzed. This part is made to compare the different pictures,
by comparing the different histograms. This is the final step. With this, we are able
to recognize the different gestures.
To conclude, we can say that this method does not require special mathematical back-
grounds. Therefore, once we understood the main way to realize it, we just had to imple-
ment it (3).
2.4 The Principal Component Analysis -PCA- method
In this section, we will still study the hand gesture recognition, but we will need some
mathematical background to understand what we made. This method is called: PCA or
Eigenfaces.
So, we will take a deeper look to understand the mathematical backgrounds, the aim of
this method, and the principal parts for realization.
2.4.1 The goal of this method
The Principal Components Analysis (PCA)will also be used for our gesture recognition.
It is a useful statistical technique that has found application in different fields (such as
face recognition and image compression). This is also a common technique for finding
patterns in data of high dimension too. Before realizing a description of this method, we
will first introduce mathematical concepts that will be used in PCA. Here, we will speak
about standard deviation, covariance, eigenvectors and eigenvalues. This background
knowledge is made to make the PCA section easier to understand, but can be skipped
if the concepts are already known. There are examples all the way through this kind of
lesson to illustrate the concepts explained.
2.4.2 Mathematical Backgrounds
This section will attempt to give the elementary mathematical background that will be
required to understand what is the Principal Components Analysis. We will try to realize
a kind of sum up of the principal knowledge used in the PCA method. Each parts is inde-
pendent from the other. We can notice that the goal of that is to understand the principal
-
7/28/2019 _DeBoisset_report.pdf
12/121
CHAPTER 2. THEORY AND BACKGROUNDS 11
lines of the method and especially to understand why this method is used and what signify
the results returned. We will not use all the backgrounds knowledge described here, butthe different section will provide the grounding of the main skills required.
Therefore, we will first take a quick look on Statistics, and especially on the spread of
data and on the distribution measurements. Then, the other section is on Matrix Algebra
and looks at eigenvectors and eigenvalues (important properties of matrices that are more
than fundamental to PCA).
Statistics
What we will see about statistics is how to analyze a big set of data and how to find and
understand the relationship that we have between the elements of the data set. In this
section, we will take a look on the measurements we can perform on a data set and what
they tell us about the data.
Standard deviation First of all, we will see closer what is the Standard deviation. In
statistics, we generally use samples of population to realize the measurements. The results
returned on this sample will permit to have an overview of the possible and most likely
results that we could have if we make the same test on the entire population. Therefore,
we just extend the sample results to the entire population. To explain it clearly, we will
create a data set and assume that it is just a sample of a larger data set (it is not used in
our project, but it will help us to understand easily the concept).
Here is an example set:
X = [1 2 4 6 12 15 25 45 68 67 65 98]
For the notation, we will use the symbol X to refer to the entire sample and we will
use the symbol Xn to indicate a specific data of the sample. Therefore, X3 refers to the3rd number in X (we can notice that X1 will be the first data and not X0). Therefore, withthis kind of samples, we can realize many calculations that will give us information about
the set. For example, we can first calculate the mean of the set. As it is really simple, we
will just give the formula of it but we will not describe further.
X =
ni=1 Xi
n
It is important to note that we will call X the mean of the set X. The mean of the dataset will not give us so many indications, apart of the middle point.
For example, we can have the same mean for two really different data sets. Therefore, we
will see what is important to better define the data sets below:
[0 8 12 20] and [8 9 11 12]
-
7/28/2019 _DeBoisset_report.pdf
13/121
CHAPTER 2. THEORY AND BACKGROUNDS 12
Here, what is really different between the two sets is the standard deviation. This
is a way to measure the spread out of the data in a set. Here is the definition of thestandard deviation: This is the square of the add of the average distance from the mean of
the set to the point, divided by n 1, when n is the number of points in the set. Here isthe formula:
s =
ni=1(Xi X)2(n 1)
Where s is the usual symbol for standard deviation of a sample.
We can wonder why we are dividing the sample by n 1 and not by n. We will not giveany explanations of that here, because it would be too long to explain, and it is not impor-
tant for our project. But what is important to remember is that when we use a sample of
a population and that we want an approximation results for the entire population, then we
will have to use n 1. But if we calculate the standard deviation on the entire populationdirectly, then we will have to use n instead ofn 1. We can find further information onthe web site http://mathcentral.uregina.ca/RR/database/RR.09.95/weston2.html
This page is explaining a bit more about standard deviation and about the differences
between the denominators choice. It also gives interesting experiments which are well
describing the difference between the samples or population used and therefore on the
denominators choose.
We will draw tables of the standard deviation calculation for the 2 sets written upper.
Set 1:
X (X X) (X X)2
0 -10 100
8 -2 4
12 2 4
20 10 100
Total 208Divided by (n1) 69,333Square root 8,3266
Set 2:
-
7/28/2019 _DeBoisset_report.pdf
14/121
CHAPTER 2. THEORY AND BACKGROUNDS 13
Xi (Xi X) (Xi X)2
8 -2 49 -1 1
11 1 1
12 2 4
Total 10
Divided by (n1) 3,333Square root 1,8257
As expected, the first set has a much bigger standard deviation as the second one. In-deed, the first data set has really spread out data instead of the second one.
We can just watch quickly another set, which will have a standard deviation of zero:
[10 10 10 10]
Here, the standard deviation is equal to zero, although the mean is still of 10. This is
because all the points are the same so the data are not spread out. None of them deviate
from the mean.
Variance Variance is another measure of the spread out of data in a set. In fact it is
quite the same as the standard deviation.
We can take a look on the formula:
s2 =
ni=1(Xi X)
2
(n 1)
We can notice that this is just the square of the standard deviation (thats why the sym-
bol s2 is used).Usually, we use the symbol s2 for the variance of a sample. The variance is just another
way of measuring the spread out of data in a sample. We can say that the variance is lessused as the standard deviation. In fact, the variance will be useful for the next section
which is the covariance.
Covariance The covariance will differ from the two first measurements explained in
the upper sections on one principle way: the covariance is a 2-dimensional measurement.
The covariance is a really important knowledge for the PCA method, because we will
-
7/28/2019 _DeBoisset_report.pdf
15/121
CHAPTER 2. THEORY AND BACKGROUNDS 14
need this calculation later.
So, the calculation of standard deviation or of variance will be useful in the case of onedimension data set, like the set of the marks obtained by all the ENSAM students for their
FYP (Final Year Project). But, for the PCA method, which will deal with more dimen-
sions, we will need the covariance and not the variance knowledge.
The covariance will allow us to see if there are any relationship between the different
dimensions of the data set. For example, we could realize a 2-dimensional set of the
marks obtained by the ENSAM students and their age. Then, we could see if the age has
an effect on the mark received by the student. It is exactly the kind of test that we could
perform with the covariance (We can yet imagine where we want to go with that in our
project: watch if our different pictures are in relations or not).The covariance formula is really near from the variance formula. We can write the vari-
ance formula like this, to better understand the covariance formula:
var(X) =
ni=1(Xi X)(Xi X)
(n 1)
Now we can take a look on the covariance formula:
cov(X, Y) =
ni=1(Xi X)(Yi Y)
(n 1)
We can just notice that if we try to calculate the covariance between a dimension anditself, we will get the variance.
In fact, we just replace the second part of the formula with the second dimension to ana-
lyze to obtain the covariance formula!
We can also say that it is possible to calculate the covariance between more dimensions
than two. We can calculate covariance on three dimensions for example. The lonely thing
to know is that we will calculate 9 covariances between dimensions (2 by 2) and then
create a matrix (called the covariance matrix), that will be 3 3, in case of three dimen-sions. In fact, the diagonal will be the result of the variance for each dimension and the
other terms will be the covariance between terms (for example, line 2 column 1 will be
the covariance between the y and the x dimensions). By the way, we can notice that thecovariance is commutative (we can easily replace each dimension per the other without
changing the results). Therefore the covariance matrix will be symmetrical.
Then, we can get lots of really important information with the covariance calculation.
In any case, it is important to notice that the value returned will not be as important as the
sign returned.
Indeed, if the result is positive, that will mean that the two dimensions increase together
-
7/28/2019 _DeBoisset_report.pdf
16/121
CHAPTER 2. THEORY AND BACKGROUNDS 15
(For our example on the ENSAM students -marks received and age- ) this will mean that
the mark increases when the age increases.And if the result is negative, then it will mean that when one dimension is increasing, then
the other is decreasing.
Last case, the result returned is null. That will just mean that our 2 dimensions do not
have any kind of relations between them. They are independent.
Therefore, the covariance calculation can bring us really important indications on the
set of data we are studying. With it, we can then represent the covariance between 2 di-
mensions in a graph to get an idea of the relation that exists between them.
Of course, it will not be possible to represent the covariance when our data set will have
more than 3 dimensions.Although the covariance can just be calculated between two dimensions and it is not
possible to represent the relationship between the data when we get more than 3 dimen-
sions, the covariance is often used for big data set with many dimensions. Indeed, we
can calculate the relationship between the dimensions and have some exploitable results.
Moreover, it will be pretty hard to visualize the relationship between dimensions when
we have a huge data set with many dimensions without the calculation of the covariance.
Therefore, the calculation of the covariance will bring us lots of help to see the relation-
ships between dimensions in a data set like we have in our project.
The covariance matrix Recall that covariance is always measured between 2 dimen-
sions. If we have a data set with more than 2 dimensions, there is more than one covari-
ance measurement that can be calculated. For example, from a 3 dimensional data set
(dimensions x, y ,z ) we could calculate cov(x, z), cov(y, z)...In fact, for an n-dimensional data set, we can calculate n!
2(n2)!different covariance values.
The other will be the variance in the diagonal.
A useful way to get all the possible covariance values between all the different dimen-
sions is to calculate them all and put them in a matrix. Lets have a quick overview of the
definition for the covariance matrix for a set of data with n dimensions:
Cnxn = (ci,j, ci,j = cov(Dimi,Dimj))
Where Cnxn is a n by n matrix (n rows and n columns), and Dimx is the xth dimension.
We can here notice that the covariance matrix will be square in any case, and that each
-
7/28/2019 _DeBoisset_report.pdf
17/121
CHAPTER 2. THEORY AND BACKGROUNDS 16
part of the matrix is the result for a covariance calculation between two dimensions (ex-
cept for the diagonal as said before).
For example, we will build the covariance matrix for a 3 dimensional data set, using the
usual dimensions x , y and z. Then, as the matrix is square we will have the values below:
C =
cov(x, x) cov(x, y) cov(x, z)cov(y, x) cov(y, y) cov(y, z)
cov(z, x) cov(z, y) cov(z, z)
As said earlier, the matrix will be symmetrical, and the diagonal will be the variance
calculation. Therefore, we can say that the matrix will have this form:
C =
var(x) cov(x, y) cov(x, z)cov(x, y) var(y) cov(y, z)
cov(x, z) cov(y, z) var(z)
Therefore, we will have 6 terms to calculate for the 9 terms.
Matrix Algebra
This section is made to provide a background for the matrix algebra required in PCA. We
will especially take a closer look at the eigenvectors and eigenvalues of a given matrix.
Lets see an example of matrix:
2 32 1
32
= 4
32
For example, 4 is an eigenvalue of the matrix.
Eigenvectors First of all we will give the wikipedia definition of an eigenvector:In linear algebra, the eigenvectors (from the German eigen meaning inherent, character-
istic) of a linear operator are non-zero vectors which, when operated on by the operator,
result in a scalar multiple of themselves. The scalar is then called the eigenvalue associ-
ated with the eigenvector.
As we can see in the example upper, the results of the multiplication between a vector
and a matrix returns exactly 4 times the beginning vector. We have here an example of
-
7/28/2019 _DeBoisset_report.pdf
18/121
CHAPTER 2. THEORY AND BACKGROUNDS 17
eigenvector. We will try to explain this example to better understand the eigenvectors.
The vector is a 2-dimensional one. The vector
32
represents an arrow going from the
origin (0, 0) to the point (3, 2). The matrix
2 32 1
can be imagined as a transformation
matrix. Therefore, if we multiply this matrix with a vector, the result returned will be
another transformed vector. If this transformed vector is just a multiplication of itself
by a scalar, then it is an eigenvector and the scalar will be the eigenvalue associated to the
eigenvector.
Now we will try to see the different properties of these eigenvectors:
First of all, we can just find eigenvectors for square matrixes. We can also say that
not every square matrixes do have eigenvectors. In the case they have, then they can
not have more eigenvectors than their dimension (for a 3 3 matrix, the maximumnumber of eigenvectors is 3).
You can multiply an eigenvectors by a scalar, it will still be an eigenvector (because
we just change the length and not its direction).
All the eigenvectors are orthogonal between them, no matter the number of dimen-
sions.
Most of the time, the returned eigenvectors are normalized (norm = 1). It will bethen easier to exploit.
We can find further information on eigenvectors on the web site:
http://www.mathphysics.com/calc/eigen.html .
Eigenvalues Each eigenvector is associated to an eigenvalue. The eigenvalue could giveus some information about the importance of the eigenvector. The eigenvalue are really
important in the PCA method, because they will permit to realize some threshold to filter
the non-significative eigenvectors, so that we can keep just the principal ones.
MATLAB will return the eigenvalues and the eigenvectors of the covariance matrix with-
out any problem.
-
7/28/2019 _DeBoisset_report.pdf
19/121
CHAPTER 2. THEORY AND BACKGROUNDS 18
2.4.3 Main steps of the method
Finally we arrived to Principal Components Analysis (PCA), the interesting part of our
project. We could first answer a question: What is it exactly?. We can answer that it
is an algebraical way to compare images by compressing the set of data and highlighting
the principal components of the set.
The main advantage of PCA is that once we have found the principal components of
the set, which express pretty well the data, we can take back the beginning data (images
in our case) with a low loss, even if the compression is really high!
In this section, we will try to explain how we went through the problems to realize this
method to make gesture recognition. Therefore, we will describe the work made step by
step, to understand each part of the work.
We can then split up the method into its main parts:
First of all, we had to create the data set. Indeed, we had to take some pictures of
the hand that could do the database for the PCA recognition. The aim is to choose
a good number of pictures and a good resolution of these in order to have the best
recognition with the smallest database. Then, the aim is to make the database. To
create it, the theory is to transform all the pictures in a simple vector, which willhave a dimension of the number of pixels. Then, we create a matrix where each line
is an image-vector... The results for 12 pictures and a 640 480 definition will be a12 307200 matrix.
Then, the next step is to subtract the mean from each of the data dimensions. The
mean subtracted is simply the average across each dimension.For example, for a
three dimensions x, y and z, we will have to subtract x from x, y from y and zfromz. The aim is to center our set in the space of all the dimensions (we will see laterfurther explanations of the different spaces used, but what is important to remember
here is that we have to subtract the mean to center our set of data).
The step three is to calculate the covariance matrix of the database. It will be quite
difficult in our case, because the data set is really huge! So we have found a method
to simplify this calculation. We will explain the method:
Indeed, we can not calculate the covariance matrix of the first matrix , because it
will be too huge. So we had to find a way to find out the principal eigenvectors
without calculating the big covariance matrix.
I have found the solution in a paper written by M. Turk and A. Pentland. [ 23]
-
7/28/2019 _DeBoisset_report.pdf
20/121
CHAPTER 2. THEORY AND BACKGROUNDS 19
The method consists in choosing a new covariance matrix.Indeed we will call our second matrix (all the images with the mean subtracted)
12 307200 :A. Our training set of image will be B1, B2, B3...B12 with dimen-sions l c. M is the average of the whole set of pictures. As seen earlier, wetransform each image in a vector of l c dimensions. So, we can say that our pic-ture is a point in a l c dimensional space. Therefore, our 12 images represent 12points in this space. But, as we centered the set (by subtracting the mean), each
picture is not so far from the other in this space (because they are quite similar at
the end). Therefore, it is possible to express our data set with less dimensions.
Our covariance matrix for A will be called C and C is defined by:
C = A AT
Then, the eigenvectors and the eigenvalues of C will be the principal componentsof our data set. But as explained before, we can not calculate C.The idea is to say that when we have 12 points in a huge space, the meaningful
eigenvectors will be less than the dimension, and the number of the meaningful
ones will be the number of points minus 1. So in our case, we can say that we will
have 11 meaningful eigenvectors. The remaining eigenvectors will have an eigen-
value around zero.
Fortunately, it will be easier to calculate the eigenvectors of a 12 12 matrix thanfor a 307200 307200 matrix!We will name the eigenvectors of the covariance matrix AT A, vi and its eigenval-ues ki.We can then write:
AT A vi = ki vi
Then we multiply both side per A:
A AT
A vi = A ki vi = ki A vi
We can see that A vi are the eigenvectors ofC = A AT. Now , we will construct
a new matrix L = AT A, and we will find the l eigenvectors vl ofL.These vectors determine linear combination of the 12 training set images to form
the eigenpictures of our set.
So, with this subtlety, we will have a small covariance matrix to calculate : 12 12instead of307200 307200! The calculation will also be much faster and the eigen-
-
7/28/2019 _DeBoisset_report.pdf
21/121
CHAPTER 2. THEORY AND BACKGROUNDS 20
vectors returned are the principal one.
Then, we will calculate the eigenvectors and the eigenvalues of the covariance ma-
trix. This will give us the principal orientation of the data. MATLAB will do it
easily.
After that, we have to choose the good components and form the feature vector.
This is the principal step. We will have to choose the principal (most important)
eigenvectors with which we can express our data with the lowest information loss.
We also have to choose a precise number of eigenvectors to have the less calculation
time, but the best recognition. Here, the theory says that we will normally have 11
meaningful eigenvectors.
Last, the final step is to make a new data set (that we will call eigenset). Then, it
will be possible to realize the last script which could compare the different pictures
and class them by resemblance order. To compare the different pictures, we will
have to express each image of the data set with these principal eigenvectors. The
last thing to do is to compare (by calculating the euclidian distance between the
coefficients that are before each eigenvector).
To conclude, we can say that we will need more mathematical backgrounds for this
method. Then, once the theory is well understood, we can implement this method on
MATLAB too. In the next chapter, we will take a closer look on the implementation of
the different methods.
-
7/28/2019 _DeBoisset_report.pdf
22/121
Chapter 3
Implementation and explanation
3.1 simple subtraction method
3.1.1 Realization of the method
We will not take too many times to explain how we realized this comparison, because it
is really simple to create and the results are really bad. Therefore, what is important to
notice here is:
Before doing the subtraction, we applied some adjustments on the contrast first, and
then we applied a blurring filter to erase the background imperfections.
We performed some tests on this method that we can see in the last chapter 4). This
figure shows the efficiency of this method : 4.14.
This method confirms the idea that we should implement other methods, because
the results returned are really bad and we can say that it does not work properly.
3.1.2 Conclusion on the method
In any case, it is a good thing to know what are the results for this method. It was our
first idea and it has confirmed us that we had to look further in the image analysis, by
implementing other methods.
3.2 Histograms of oriented gradients method
In this section, we will explain how we implemented this method, and the problems en-
countered. We will try to understand each part of the method and why it works or not
21
-
7/28/2019 _DeBoisset_report.pdf
23/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 22
works in the different cases.
3.2.1 Step 1: Gradient magnitude calculation
After a first approach of the MATLAB software, we realized the first script which had to
calculate the magnitude of each gradient in the image.
Magnitude gradient definition:
If dx and dy are the outputs of the x and y derivative operators, then the gradient
magnitude is calculated by:mg =
dx2 + dy2 (3.1)
Therefore, in order to calculate the gradient magnitude, we had first to calculate the
derivative dx and dy of the image.
The script of the derivative operator has been found on internet, but we can see what
it looks like to understand the following steps:
X-derivative operator script:
explanation of how to use the script:
function d = xDeriv(im, xRadius, yRadius, shape)
XDERIV Returns the X-derivative of image im.
D = XDERIV(IM, XRADIUS, YRADIUS, SHAPE)
IM - Input image.XRADIUS - half the width of the vicinity in which the
derivative is calculated.
YRADIUS - half the height of the vicinity in which the
derivative is calculated (default: equal to XRADIUS).
SHAPE - Either of:
full - (default) returns the full 2-D convolution,
same - returns the central part of the convolution
that is the same size as A (the default).
-
7/28/2019 _DeBoisset_report.pdf
24/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 23
valid - returns only those parts of the convolution
that are computed without the zero-paddededges, size(C) = [ma-mb+1,na-nb+1] when
size(A) size(B).
Written by Ariel Tankus, 19.9.96.
Therefore, this script can calculate the derivative matrix by entering the image refer-
ence, the xradius, the yradius and the returned shape you want.
So, with the same script for the y derivative, we could have the gradient magnitude reallyfast. We just had to know about the speed of the running process.
We also implemented the gradient magnitude script as described below:
Gradient magnitude operator script:
function(mag,dx,dy) = grad(im, xRadius, yRadius, shape)
GRAD Return the gradient magnitude of the given image.
(MAG, DX, DY) = GRAD(IM, XRADIUS, YRADIUS, SHAPE)
IM - image
XRADIUS - half width of derivation vicinity.
YRADIUS - half height of derivation vicinity.
SHAPE - either of same, valid, full. See xderiv.
MAG - Gradient magnitude.
DX - X-derivative (optional).DY - Y-derivative (optional).
The outputs are for MAG:
M AG =
min(min(
dx2 + dy2)),max(max(
dx2 + dy2))
(3.2)
-
7/28/2019 _DeBoisset_report.pdf
25/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 24
The MAG output returns a two-dimensional vector which has the minimal and the
maximal term of the gradient magnitude matrix. The aim is to relieve the calculation witha smaller output.
The DX and DY outputs return two matrixes, which have both the size of the image ma-
trix.
What we just needed as the outputs is the minimum and the maximum of the gradient
magnitude in order to realize an efficient threshold (to cut the lowest gradient magni-
tudes). But we will come back on this part later (3.4).
Then, having the minimum and the maximum gradient magnitude of the picture, we
could go through the second part: The gradient orientation calculation.
3.2.2 Step 2: Gradient orientation calculation and magnitude thresh-
old
Once we have made the gradient magnitude calculation, we implemented the second script
which had to calculate the orientation of each gradient (which is enough important) in the
image.
Gradient orientation definition:
If dx and dy are the outputs of the x and y derivative operators, then the gradientdirection is calculated by:
dir = arctan (dy/dx) (3.3)
Therefore, in order to calculate the gradient direction, we had just to use the derivative dxand dy of the image, that we already calculated for the magnitude.
Then, after having applied a threshold on the gradient magnitude, we had to sort out all the
measurement in a 36 dimensions vector. We made 36 bins (10 degrees each). And after
having the vector, we plot it in polar and cartesian coordinates, just to have an overview
of the orientation.
We will describe the script wrote to realize this implementation:
-
7/28/2019 _DeBoisset_report.pdf
26/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 25
Gradient orientation operator script:
function(Z) = grador2(im, xRadius, yRadius, shape);
This function returns the 36 dimensional gradient orientation vector of the given im-
age.
(ORI, DX, DY) = GRADORIENTATION(IM, XRADIUS, YRADIUS, SHAPE)
IM - image
XRADIUS - half width of derivation vicinity.
YRADIUS - half height of derivation vicinity.SHAPE - either of same, valid, full. See xderiv.
ORI - Gradient orientation matrix (contains all the gradient directions).
(gm,dx,dy)=grad(im, xRadius, yRadius, shape);
We call the gradient magnitude operator. It will return the X-derivative, the Y-derivative
and the 2 dimensional magnitude vector (to have the maximum and the minimum of all
the gradients).
gm = ((gm(1) + gm(2)) 0.1) + gm(1) (3.4)
This is the threshold number. It is relative to the image and fixed at 10 percent of the
scale (between the minimum and the maximum gradient magnitude).
Then, we defined that when we have less inputs than 3, we will consider the shape as
same and the yradius equal to theyradius.
The next step was to create a 36 dimensional vector which was full of zeros. Then,
we just had to increment each bin when an orientation is found.We had to take care that
the arctan function is just available from 0 to Pi.
After that, we had to create the gradient direction matrix, with all the pixels gradient
directions. We had to take in consideration the threshold to have the main gradient mag-
nitude orientation.
We will see in the section 3.2.6 that we have some problems with the borders of the
pictures. Therefore, we will have to take in consideration this problem and we will cut
the new image border. It is relative to the input and will avoid the high level gradient
magnitude which are calculated on the different borders of the pictures.
-
7/28/2019 _DeBoisset_report.pdf
27/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 26
So after having realized the gradient orientation vector of each image, it will be possi-ble to realize comparison between the different images.
Then, to see the histograms, we will have to display it by showing the cartesian and
the polar representation of the orientation vector.It is good to have the two histograms to
see accurately where the orientation peeks are.
So, we have seen how to calculate the gradient orientation vector of a picture. With
the different gradient orientation vectors of the data set, it will be possible to compare the
images between them and so to sort them out.
We can take a look on the histograms form on the figure 3.1.
Figure 3.1: Representation of the orientation histograms for each new position Onthis graph, each histogram is drawn under its picture.
3.2.3 Step 3: Gaussian filter operator
After having lots of problems in our orientation vectors (too much noise), described in the
paragraph 3.2.6, we decided to realize and apply a gaussian filter on the picture. We will
now detail the way we realized a gaussian filter on MATLAB, in order to blur the image
-
7/28/2019 _DeBoisset_report.pdf
28/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 27
and to erase unwanted high level contrasts. We can notice that the gaussian filter function
is also in the Image processing toolbox. Therefore, it was useful at the beginning of theproject (we did not have this toolbox), but then we used the direct MATLAB function.
Gaussian filter operator script:
This script will just return the filtered image of the given one.
The aim of this script is to balance each pixel in function of the other. Therefore, we just
have to put a weight on each pixel around the one to be blurred. After that, a white pixel
on a black background will become dark gray. We decided to make a circle filter of three
pixels around. We choose to balance with this numbers below for the different weight, asseen on he table 3.2.
Figure 3.2:
In order to have a good filter, we can change the value of the filters. But, with these
values, the picture will be well blurred and it will erase a large part of the background
noise without having a too strong filter.
3.2.4 Step 4: Euclidian distance comparison
In this section, we will see how we realized the comparison between images, in order to
have a realistic and efficient gesture recognition.
We have used the euclidian distance on the gradient orientation vector (36 dimensional
vectors) to calculate the difference between two pictures.
The euclidian distance comparison:
-
7/28/2019 _DeBoisset_report.pdf
29/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 28
In order to obtain an efficient comparison between images, we decided to calculate
the euclidian norm between the orientation vectors of the picture that has to be recog-
nized and the data set pictures (We just calculated the euclidian distance of the gradient
orientation vector between the analyzed picture and each database picture. We then sorted
it out and selected the four smallest one). Therefore, we implemented a script that had to
do this calculation between all the gradient orientation vectors.
Here is the script we made:
The vector-vector comparison script:
function(disteuclid)=fini2(b, Im2, xRadius, yRadius, shape);
Returns the 4 nearest database pictures of the analyzed image.
First we created a 25 dimensional euclidian distance vector, where each dimension is
a result of the comparison with a database picture.
Then, we had to use the gradient orientation script to calculate the the orientation vector
of the picture to analyze.
After that, we returned the index of the 4 smallest vector terms in the 25 dimensional
euclidian distance vector. With these terms it will be easier to find the images correspond-
ing to these indexes and then to take back the label of the image and to display it.
The script did work really well, but the executing time was really to high. It needed
180 seconds to give the 4 nearest pictures. So, the new goal was to considerably reduce
this time in order to have a quick answer to a done picture. We must not forget that the
final aim is to have a real time application!
The idea was to realize a script that could create a matrix (MATLAB works faster withmatrix) of all the gradient orientation database vectors and to save it in a text file. Then,
it would be easier to compare the gradient orientation vector of the analyzed picture with
each line of the database matrix, which we will call MATDIST (see C). That would cut
all the calculation time for all the database pictures. Indeed, We did not need to calculate
each time all the database pictures.
We can now see how we made the code to create the MATDIST:
-
7/28/2019 _DeBoisset_report.pdf
30/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 29
The database matrix operator:
function(matdist) = matdist(Im2, xRadius, yRadius, shape)
This function returns matrix of all the gradient orientation vectors of the pictures
stored in the database.
We decided to use the tic underlineMATLAB function to launch a chronometer. Thegoal is to know the calculation time for the matrix creation (we took 26 images for the
data set). It is a good function to know the efficiency of the algorithm we made. More-
over, we can then easily know the time we won with the different changes we made.
We just had to add a matrix creation in the loop (which will be 26 lines for the 26 images
and 36 columns for the 36 orientation bins). This new matrix will just be the combina-
tion of all the orientation vectors. Each line will be the gradient orientation vector of a
database picture.
Once the matrix is created, we just had to save it in a specified folder so that we can load
it whenever we want.
The time to execute this script to create the database matrix is around 70 seconds and
you just need to run it one time. With this script, we have won the time we wanted. Now,to operate and recognize a picture, we will run another script that compares the analyzed
image with this matrix.
We can now explain and comment the new script encoded:
The vector-matrix comparison operator:
function(disteuclid) = fini3(b, xRadius, yRadius, shape)
This script returned a window displaying the 4 nearest images with the euclidian distanceassociated between each image and the compared one.
First of all, we began by loading the MATDIST.
Then we just had to calculate the orientation vector of the image to analyze and to com-
pare it with each line of the database matrix.
After that, we had to sort the distance out and to take back the database pictures with their
label: the class is recognized.
-
7/28/2019 _DeBoisset_report.pdf
31/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 30
With this script, the time needed to compare the images between them is about 3
seconds. We won around 177 seconds in the execution times (We needed 180 seconds
to make a comparison before)! To immediately test our results, we just had to take a
database picture and to compare it with the database. If everything works, then it should
return the same image in first with an euclidian distance equal to zero. This is the result
we get in any case when we use a database picture. It verifies that in a case of two identi-
cal pictures, the script returns a logical result.
We can see it on the picture 3.3.
On the picture 3.3, we can see that the image returned as the nearest is the image en-
tered in input. This is a way to check the algorithm. Here, the algorithm is well working.
The euclidian distance gives true results. Now, we have to check that the method is good
in recognizing the picture with similar picture in the database.
Now we can have a look on the 4 nearest pictures returned for the 1 position on the
figure 3.4.
On the figure 3.4, we can see that the two first pictures returned are the same gesture.
But the third one is not the same gesture. Therefore, we had to try with others hand ges-
tures and see the results. We will further test the algorithm in the chapter 4.
3.2.5 Step 5: Establish a comparison matrix
In this part, we will see further than just realizing the script. We will try to know why
sometime it is not working as expected and what kind of solutions we could bring to havebetter results.
As we have seen upper on the figure: 3.4, the results are not always as good as expected.
For example, if we ask the result for another hand gesture recognition that is more com-
plicated, we can see the returned answer on the figure 3.5.
We can see that the results expected are clearly not the results given. This problem
-
7/28/2019 _DeBoisset_report.pdf
32/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 31
Figure 3.3: Results of the 1st answer of vector-matrix comparison script for a
database picture. We can check that the the first image has a euclidian distance equal to
zero.
comes from the database quality and size, or from the different positions we took. Wehave 26 pictures in the database and we took all of them different to see the problems that
we could have...
We also change the orientation of the hand and the fingers spacing during the hand shoot-
ing, to recognize more positions.
What we can notice is that we have much better results for the position 1. It is just
because the spacing is not influencing the results and we just have the orientation of the
finger that is really acting upon the results. Thats why we have better results with the
-
7/28/2019 _DeBoisset_report.pdf
33/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 32
Figure 3.4: Results of the vector-matrix comparison script for a database picture(1).
position 1 picture.
Therefore, a solution would be to widen the database. We can take lots of pictures for
each hand position and then apply the script again. The new problem expected is the run-
ning process time that will be too excessive.
The second solution is to change our gesture position and to choose new one, that are
really different. We will try this possibility after.
To define the way to go to, we tried to identify the problem clearly. Therefore, we decided
to realize a matrix which could show in a gray scale if the different database pictures are
close or not (all the pictures were taken under the same lighting) in terms of euclidiandistance between their gradient orientation vector. Black will mean that the pictures are
really close and white that they are really different.
To realize this matrix, we just had to use the database matrix already done and to com-
pare each line with the others one. Then, MATLAB will display the new matrix in a gray
scale to show the results.
We can see the image of this matrix on the figure 3.6.
-
7/28/2019 _DeBoisset_report.pdf
34/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 33
Figure 3.5: Results of the vector-matrix comparison script for a database pic-
ture(position:5).
We can see the database for this gray matrix on the picture 3.7.
The best is to have just white everywhere, but black in the diagonal. Indeed, the white
means that our gesture are really far between them and the black means it is the same
picture. Therefore, we will have black in the diagonal in any case, because it is the com-
parison between two identical pictures.
To verify that our gestures are good between class and in a class, we can plot this gray
matrix and when we have black in the same class and white between the class, that means
that our gestures are perfectly choose.
Here, we can see that our gestures are too close. There is for sure too many dark gray
in the matrix. This shows us that the real problem is our gesture positions. Indeed, the
positions are too close and then the recognition will be too hard to realize.
Therefore, we decided to reduce the number of positions and to take just three really
different positions: Rock, paper and Scissors. It will also allow us to realize an ap-
plication (the well-known little game) to obtain a concrete comparison application.
-
7/28/2019 _DeBoisset_report.pdf
35/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 34
Figure 3.6: Returns of the graymat function for a 5 pictures database (1 image per
class).We can see that the diagonal is black: that shows that the matrix is well calcu-
lated.On the diagonal is the euclidian distance between the two same vectors...
We can see the gray matrix of our new gestures on the figure 3.8.
With the observation of this new gray matrix, we can confirm that the new gestures
choose are much better than the others. We have white between the different gestures (that
means that the positions are far between them) and black in the diagonal, as expected. We
can see in the results (Chapter 4) that the recognition will also work far better.
3.2.6 Problems encountered
During the realization of the different steps, we came across different problems that we
will explain in this part. We will not say all the problems we had, but the one were we
lost some time.
-
7/28/2019 _DeBoisset_report.pdf
36/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 35
Figure 3.7:
First problem:
The first problem we had, was about the different class in MATLAB. Indeed, MATLAB is
auto defining its term classes and all the function used depend of the class of each element
called in the process.
When we charged the image (MATLAB imread function), the class of the resultant ma-
trix was colorful uint8 (three matrixes) and when we had to calculate the gradient, we
needed a double class gray equivalent element in the matrix. Therefore, we had to write
a small script which could convert a uint8 MATLAB class in a double one (you can
find a direct function if you have the image processing toolbox, but we did not have it at
the beginning of the period as said before, thats why we implemented this small script).Here is the script we wrote to make this transformation:
Then we had to transform it in gray scales. We used these coefficients to have a good
gray scaled image, where A1 is the colored image:A1=((A(:,:,1))*0.3+(A(:,:,2))*0.59+(A(:,:,3))*0.11);
So we changed the three RGB matrixes in one gray equivalent matrix. The coefficients
are chosen to respect the different contrast.
-
7/28/2019 _DeBoisset_report.pdf
37/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 36
Figure 3.8: Returns of the graymat function for our new gesture database. We can
see that the diagonal is black and that the other colors are much whiter (1 image perclass).
After that, we could easily calculate the gradient of the image, but with some loss (we
transformed a colorful picture in a gray equivalent picture -three matrixes to one-).
Then, we bought the image processing toolbox, so we could just use the new MATLAB
function.
Second problem:
The second problem was the image border. When we calculated the gradient magni-
tude of the picture, the boarder were included in the calculation with a very high level,
due to the consideration of the XRadius and the Y Radius. We can see it clearly on thepicture 3.9, where we calculated the gradient magnitude on a simple form (white triangle
on black background):
On this figure (3.9) , we can really see the noise of the borders in the picture. The
gradient magnitude operation calculate the borders as a part of the image. That will bring
-
7/28/2019 _DeBoisset_report.pdf
38/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 37
Figure 3.9: Gradient magnitude of a triangle with noise on borders
lots of problems later in the second step: Gradient orientation calculation 3.2.2.
Therefore we decided to cut the borders in the gradient magnitude calculation, otherwise
all of our histograms would be similar.
Third problem:
When the background is not completely black and dark, we will have problems with
the reflects and the contrasts for the gradient magnitude calculation. Indeed, we will have
lots of noise that will be part of the gradient orientation vector. Therefore, in order to
avoid this noise, we can apply a gaussian filter on the image. That will blur and soften
the contrast. Then, we will have less high gradient magnitude noise. A complementary
way to avoid this kind of noise is to take the picture on a really black background. We can
see on the different pictures below 3.10 and 3.11 the differences between the gradients
magnitude of each image and the the different histograms returned.
We can also look the gradients magnitude images of these two pictures on the figure
3.11.
-
7/28/2019 _DeBoisset_report.pdf
39/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 38
(a) (b)
Figure 3.10: 1 finger picture with black and gray backgrounds We can see the noiseof the gray background (a). That will bring lots of problems on the gradient calculation.
With a good black background (b), we really simplify the problem.
(a) (b)
Figure 3.11: Gradients of 1 finger picture with black and gray backgrounds We can
clearly see the white reflects of the gray background after the zoom (a), compared to the
black background (b), even if we applied an important gaussian filter. The histogram willalso be deteriorated.
We can notice that with a good black background, we have no troubles. We cut all the
background noise. We will see later how we can do to resolve this recognition problem.
Now we will compare the two histograms on the figure 3.12.
To conclude with these few pictures, we can say that having an homogeneous black
-
7/28/2019 _DeBoisset_report.pdf
40/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 39
(a) (b)
Figure 3.12: Histograms of oriented gradients of 1 finger picture with black and gray
backgrounds On these two graphs, we can notice that for the black background (b), the
histogram is much more accurate than for the gray one (a). Therefore, it will be easier to
treat. We need to have precise histograms to have a good gesture recognition
background will make the work easier.We will have much more precise histograms, and
the recognition result will be far better.
3.2.7 Conclusion on the method
After having implemented this method, we understand much more about images and what
is scientifically behind an image. We can see that the results obtained are good but we
could have think that they will be better. By implementing the next method, we will surely
have new ideas to make this method more efficient. We will then test the method in the
chapter 4.
-
7/28/2019 _DeBoisset_report.pdf
41/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 40
3.3 PCA or Eigenfaces method
We will now explain each part and detail the way we made the different algorithms.
3.3.1 Step 1: Realize the database
First of all, we had to choose how to make our database and what kind of database would
be the best for the recognition. We choose to take the minimum of pictures to have the
best recognition.
It is important to notice that for the Eigenface method, we work with the entire pictures at
beginning. Then we reduce the datas (our aim is to express the data set with less factors).Therefore, we must take care of the data set to decrease the calculation time. There are
two parameters to include to realize the database:
The number of pictures.
The size of each picture, which will be part of the size of the first matrix to reduce.
Both are really important. Indeed, when we will create the first matrix to reduce, its
size will be the number of pixels by the number of pictures.
Therefore, if there is too many pictures or too many pixels in each picture, the calculation
time will grow fast!
For example: 10 images with a definition of 640 480 will give a matrix, which size is10 307200. and 10 images with a definition of1280 960 will return a matrix size of10 1228800.So, as you can see here, it can easy and fast become a really huge matrix. The calculation
time will then hardly depend of that. Moreover, we must not forget that MATLAB can
not manage such big matrixes too.
Therefore, the question of the database is a really important question, because it will
then determine the efficiency of the method (and its calculation time too).
At the beginning, we could not know how many pictures we had to take and what size wehad to choose. So we decided to make a database of 12 pictures (4 of each position) with
a definition of640 480
To choose what kind of database would give the best recognition, we realized some tests
(after having implemented the method) of efficiency with different numbers and size of
pictures in entrance.
-
7/28/2019 _DeBoisset_report.pdf
42/121
-
7/28/2019 _DeBoisset_report.pdf
43/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 42
3.3.2 Step 2: Subtract the mean
The next step is to calculate the mean of each direction. It is a fast step. We just had
to take the first matrix of all the images, and then to ask to MATLAB to calculate the
mean of the matrix. Then, we subtracted it to the first matrix. We do not have so many
things to say about this step as it is really trivial. We must nor forget that this part is really
important to center the data set pictures in the space.
3.3.3 Step 3: Calculate the covariance matrix
This step was a bit more difficult than the two first one (we had to well understand the
theory to realize the calculation precisely).
But once we understood the subtlety described in the second chapter 2, the calculation
becomes fast and easy to implement. We can see on the picture 3.14 the different eigen-
pictures returned by this covariance matrix.
Figure 3.14: Example of the eigenpictures of the data set used for the PCA recogni-
tion method
-
7/28/2019 _DeBoisset_report.pdf
44/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 43
3.3.4 Step 4: Eigenvectors and eigenvalues calculation, and choose
the good eigenvectors
In this step, we will take a closer look on the calculation of the eigenvectors and eigenval-
ues of the covariance matrix, and how to choose the good one.
Indeed, it is really important to choose the good eigenvectors to express the data set with
the best base. The number of eigenvectors choose will be in direct relations with the re-
sults that we get.
The value of the eigenvalues (between 0 and ) will determine if the eigenvector is
important or not in the expression of the data set in the new space.
Therefore, we thought we would have to realize a threshold on the eigenvalues to keep
the most important eigenvectors. It is sure that it is really important to realize an efficient
threshold to have the best results. At the beginning, we decided here to keep the 11th
first eigenvectors (as described in the theory: it is 11 for 12 database pictures). But then,
we planned to make some tests to know which distribution is the best (we had to find the
value of threshold that give good results and that decrease the calculation time efficiently).
But at the end, we decided to keep just three images in the data set (after having performed
other tests in the chapter 4). Therefore, we decided to keep all the eigenvectors, because
three eigenvectors is in any case really small.
3.3.5 Step 5: Realize the new Data set and compare
In this step, we will have to realize the new data set, by saving the new matrix of the
eigenpictures and expressing each image of the database with the principal eigenvectors
(we just have to realize a scalar product between the eigenvectors kept and the image). We
will then save the coefficients that will be in front of each eigenvectors for each database
image. Therefore, we will have as coefficients as eigenvectors.
At the end, it is a way to express each image with the eigenvectors calculated. We will
then express the image to analyze with these eigenvectors too. With these coefficients,
we will be able to compare the images between them, by comparing the coefficients (we
make the euclidian distance between each image coefficient). The results returned are
quite good, as we can see in the chapter 4.
-
7/28/2019 _DeBoisset_report.pdf
45/121
CHAPTER 3. IMPLEMENTATION AND EXPLANATION 44
3.3.6 Conclusion on this method
After having implemented this method, we understand how we can hardly compress with
low loss a huge set of images. Moreover, we have seen an other view of image analysis
and it is really interesting to compare the two kind of methods. It is what we will do in
the next chapter (4): Tests, results and analyze. We can see that the results returned
are quite good, but we can easily imagine that this method will be better with centered
image, because the position in the picture of the gesture will be really important. We can
understand that the second name for this method Eigenface is not an hazardous name. It
is just that it should better work with faces than hands, because it will be easier to center
a face in the picture (by centering the mouth and the eyes).
-
7/28/2019 _DeBoisset_report.pdf
46/121
Chapter 4
Tests, results and analyze
In this chapter, we will explain the different tests made and the results returned. It will
give a kind of tutorial of each method and then help people to choose one or the other
method in function of the application they want to create. We will also explain the draw-
backs of each method and the technical reasons of these drawbacks.
So in a first part, we will see the application realized and we will give the complete script
of the application. We will explain how to use the application too. After that, we will see
the different choices made for each method and explain why we made these choices by
performing tests. Then, we will make a simple comparison of the different methods and
draw graphics of the results of each method in different conditions.
4.1 The application: Rock Paper Scissors Game!
After having implemented the different methods to realize the gesture recognition, we
decided to implement a small application which would use the different methods.
The first idea that comes to our mind was to realize a simple game that everybody knows:
The Rock-Paper-Scissors Game!Indeed, it was the best way to test our gesture recognition script with fun.
Moreover, as everybody knows the game, it is really easy and comfortable for other peo-
ple to test the scripts and the recognition level of each method. The application has a GUI
form for a better interface with the player. We tried to make an easy-to-use application,
with very few things to do to realize the database or to play the recognition game.
We can take a look on the Gui form shown on the figure 4.1.
45
-
7/28/2019 _DeBoisset_report.pdf
47/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 46
Figure 4.1: Photo of the application realized
We can see on this screen shot the GUI form of the application and the overview of the
different options proposed by the game.
Then, we will see a photo of the environment constructed (PC and web camera environ-
ment) to take good pictures for a better analysis. Moreover, it is important to know which
-
7/28/2019 _DeBoisset_report.pdf
48/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 47
environment we used to realize these pictures, because it is in direct relations with the
results returned.The black background for example is really important. We can see that the environment
looks quite basic. Indeed, it will be pretty easy for someone to make its own working
space and use this script.
Then we will explain each button, and what the application can do.
But first, we can watch the working space on the figure 4.2.:
Figure 4.2: Photo of the working space realized for the gesture recognition applica-
tion
We can see on this photo the working space realized for the gesture recognition. We
used a Philips camera, with a tripod. We cut a wood board and painted it in black to
have a better background. We can just say that paint is surely not the best way to havean uniform background, but that was what we had. Indeed, even we took a matt paint,
the different lighting set are directly reflecting on the paint and therefore influencing the
results. Thats why we make some different work on the pictures before analysis. The
best background would be a textile, because it is much more matt.
So far, everything is easy to realize. What is a bit more hazardous is to have a camera
recognized by MATLAB. We were lucky and we had one.
Now we can see the application in details on the figure 4.3. We are going to explain
-
7/28/2019 _DeBoisset_report.pdf
49/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 48
each function and why we realized it like that.
Figure 4.3: Explanation of the application realized
So, we will explain more precisely the different buttons and their utility:
The Start/Stop button: As indicated, it is made to start or stop the cam. The aim isto have more memory space by stopping the cam when the application is not used.
It will also give the preview of your gestures. The Start button must have been
pressed before starting any comparison.
The preview window: This window will just be used for the preview of your gesture.
You just have to click on the start button to have the preview.
The nearest database picture window: This small window is just made to indicate
which image is recognized in the data set. It is really useful when you have several
-
7/28/2019 _DeBoisset_report.pdf
50/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 49
pictures of each gesture. Indeed, you will know which picture is recognized and it
will help you to understand how the recognition works.
The Text field: It is made to say the different indications to the user. So, it will be
really useful for the data set creation. It directly says which gesture you have to do,
how long you have to wait or even if you won or loosed.
The confirmation for recognition buttons: These buttons are especially made for test-
ing. After having launched a method and seen the results, you can click on yes or
no to say if your gesture is recognized or not. The aim is that the application will
automatically count how many gestures were found or not. It is really appreciable
for long test series.
The Players and computers position windows: These two windows are made to
see the picture analyzed for recognition. It shows the picture of the gesture you
just made and the random computers gesture. It is to have a quick overview of the
results and to make the game more attractive.
The Text field for the score: Here, you can see the score between the user and the
computer, and the user will also read the gesture recognized. When Scissors
against Rock is written, that means that the application recognized a scissors posi-
tion for the user, and that the computer gesture (random) is a Rock. In fact, the first
gesture written will be the position of the nearest data set picture recognized.
The Reset Score button: It is just made to reset the score in a simple way.
The Load Eigenface Matrix Button: When you launch the application, to avoid you
to wait to the loading of these huge matrixes in case you just want to use the gradient
histogram method, you can load these matrixes whenever you want. You must just
know that the Eigenface method will not work before you loaded these matrixes. It
is made to access to the GUI quicker.
The Game Database Creation button: It is made to create a new database. It will
take pictures of the user and calculate all the matrixes automatically. The aim of
this button is to realize a new set of pictures (database) for each user. The resultswill be better when the user makes its own set.
The Compare with Eigenface button: It will simply launch the script that analyzes
the new picture with the Eigenface method. You can not use this before you loaded
the Eigenface matrixes. Once the matrixes are loaded, the method goes really fast.
The Compare with Gradients button: It will just launch the gradient histogram method.
This method is a bit longer as the Eigenface one, but you do not have to wait for the
loading of matrix. You can directly use this method once the GUI is opened.
-
7/28/2019 _DeBoisset_report.pdf
51/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 50
The Compare with simple sub button: This button will launch the first method im-
plemented: the simple subtraction one. This method is in the game just to show thatit is not good recognizing. You can just test it to have an overview of the results.
So, we have explained the different buttons of the application and what there are do-
ing. We have also seen the working space realized for this project. Now, it is important to
see which results we obtained for each method.
If you are interested by the script itself, then take a look on the Appendix C.
4.2 Test and choices of the parameters
In this section, we will take a closer look on the different tests made to explain our choices
in the different methods scripts.
During this project, we had to do multiple choices that have influenced the results. We
made tests to approve these choices, so that we are sure that the different way that we took
just go in favor of better results.
For both methods, we realized some diagrams to explain the results.
4.2.1 Choice of the size of the derivative filter and the number of box
for the gradients method
It is important to notice that all the pictures had the same size before doing any com-
parison. We just had pictures that made 640 480. We fixed the size of the imagesbecause it will influence on the results of the tests -the size of the derivative filter is
in pixels, so having a circle of 3 pixels on an image that is 50 30 will not have thesame effect as having the same circle on a 640 480 picture-. Thats why we fixed the
image size.
In this part, we will see how we choose the size of the derivative filter for the gradient
method. As well, we will also see how we choose the number of bins (or boxes) to count
the different orientation for the histogram. We can notice, that in all of our case, we
choose a circle derivative filter.
Before seeing the different graphics, we have to say that we realized these tests with
different positions. Indeed, we made these tests at the beginning of the project, and at
-
7/28/2019 _DeBoisset_report.pdf
52/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 51
this time, the positions to recognize were the 5 positions of the hand between the 1 and
the 5. It is a good thing that the tests are made on these position, because in this case itwill give more information (the position are more precise and more difficult to recognize,
so the influence of the number of box or of the derivative filter size will also have more
impact).
We can first look to the graphics of the euclidian distance between gestures in the same
class (1 to 5) on the figure 4.4. It is in function of number of bins and derivative filter size.
Figure 4.4: Graphic of the euclidian distance between the 1 themselves This graphic
represent the draw of euclidian distance in y-axis and the number of box with the size of
the derivative filter in the x-axis -it goes from 18box filter 3, then 18box filter 6, 18 box
filter 12, 36box filter 3 ... to 72box filter 12-.
So, as we can see on this picture, the lowest euclidian distance is for 36 bins. Then if
we make a mean of the different distance intra-class, we can see that the best choice is to
choose a derivative filter of 6, to obtain a minimum euclidian distance.
But, to confirm that, we have to take a look on the figures 4.5 and 4.6, that shows the other
position distances between themselves.
-
7/28/2019 _DeBoisset_report.pdf
53/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 52
(a) (b)
Figure 4.5: Graphics of the euclidian distance between the 2 and the 3 themselves
These two graphics represent the euclidian distance in y-axis and the number of box with
the size of the derivative filter in the x-axis as for the first graphic.
(a) (b)
Figure 4.6: Graphics of the euclidian distance between the 4 and the 5 themselves
Same graphics than the first one.
So with these graphics, we can definitely say that to have the lowest euclidian distance
in a class between position of the same class, we have to choose 36 bins. And that is what
we will choose for the application. Now, we will observe the graphics of the euclidian
distance between a class and another. We can watch the 1 against the other classes on
the figure 4.7.
Here, what is important to notice is not the highest euclidian distance, but the high
-
7/28/2019 _DeBoisset_report.pdf
54/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 53
Figure 4.7: Graphic of the euclidian distance between the 1 class and the other
classes This graphic represent the draw of euclidian distance in y-axis and the otherclasses in x-axis. Each curve is for a number of box with a size of the derivative filter.
of the euclidian distance in a class (intra-class) in comparison with the euclidian distance
between the classes.
We will just watch 1 graphic, because all the graphics look the same. We decide to choose
the first one (class 1 against the other).
We can see that between the 1 position and the 2 position, we have a mean in euclidian
distance around 0.4 (All the distance are normalized, so the maximum distance that we
can have between two classes is 2 -when the two normalized vectors are opposite-).
We can see that between the 1 position (in intra-class), the euclidian distance is alsoaround 0.4.
Moreover, we can see that for the others positions, the distance between the different
classes (inter-class) and the intra-class distance are similar.
What does it mean?
It means that our set of pictures is too close. Our different images are for sure too close,
because the distance between the classes are the same as our distance in the classes.
Therefore, we changed the data set and we took other pictures that are more far between
-
7/28/2019 _DeBoisset_report.pdf
55/121
CHAPTER 4. TESTS, RESULTS AND ANALYZE 54
them when we change the class. We founded that the positions Rock, Paper and Scissors
were corresponding to what we wanted, as well as it could do a good application.
In any way, we can still say that the best is 36 bins, because it allows in any case to
have better results. We can see that with 18 bins, the different histograms are too close,
because too many orientations are in the same bin, and that for 72 bins, the different ori-
entations are too spread. So these orientation histograms are not good. In one case, we
will have big peeks and in the other case, we will have a too regular histogram.
Now, to confirm that a circle derivative filter of 6 will be the best, we made other test