_DeBoisset_report.pdf

7/28/2019 _DeBoisset_report.pdf

1/121

HAND GESTURE RECOGNITION

Gradient orientation histograms and

eigenvectors methods

Bertrand de BOISSET

[email protected]

FRAUNHOFER INSTITUT

INSITITUT GRAPHISCHE DATENVERARBEITUNG

Fraunhoferstrae 5

D-64283 Darmstadt

Supervisor:

Didier Stricker

Examiner:

Didier Stricker


2/121

1

Declaration

I hereby declare that this dissertation and the work described in it is my own work, except

where otherwise stated, done only with the indicated sources. All the parts, which were

inferred from the sources, are marked as such. It has not been submitted before for any

degree or examination, at any other university.

DARMSTADT, June 15th 2006

Ehrenwortliche Erkl arung

Hiermit versichere ich, die vorliegende Diplomarbeit ohne Hilfe Dritter und nur mit den

angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus den

Quellen entnommen wurden, sind als solche kenntlich gemacht worden. Diese Arbeit hat

in gleicher oder ahnlicher Form noch keiner Pr ufungsbeh orde vorgelegen.

DARMSTADT, 15. Juny 2006

Declaration

Je declare que le rapport realise ainsi que le travail decrit dans ce document est un travail

personnel, sauf contre-indication, realise avec laide des sources citees dans la bibliogra-phie. Toutes les parties qui sont reprises sont indiquees en tant que telles. Ce projet na

jamais ete presentee pour aucune autre examination auparavant dans aucune autre univer-

site.

DARMSTADT, 15 Juin 2006


3/121

2

Abstract

The aim of this work is to implement different methods to make gesture recognitions. The

main parts of my work were:

First the analysis of the different ways to realize gesture recognition.

Then to implement the Gradients histogram recognition. This method consists in

calculating gradients in a picture and then construct histograms of gradients orien-

tation.

We also took a closer look on the algebraical analysis of an image, by searching theprincipal components that defines a set of pictures (eigenvectors in the space of the

data set). This second method is called PCA (Principal Component Analysis).

Then, to finish the project, we had to analyze the different methods implemented,

by performing different tests. After that, We could define the best and worst points

of each method. We also realized a small application to illustrate our work.


4/121

Acknowledgments

I would like to thank my supervisor, Alain Pagani, for his enthusiasm, help and guidance

throughout this project. I would also like to thank Didier Stricker, who supervised my

work during this period. And, I will not forget:

F. Merienne, C. Pere, M. Moll, H. Wuest,F. Vial... They all helped me to finish this

project in time and gave me some pieces of advice when i needed.

All the members of the Department for Virtual and Augmented Reality (A4) of the

Fraunhofer IGD for providing an interesting and stimulating working environment.

3


5/121

Contents

1 Project Aims 6

2 Theory and backgrounds 7

2.1 The database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 The simple subtraction method . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 The Gradient based method . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 The goal of this method . . . . . . . . . . . . . . . . . . . . . . 8

2.3.2 Main steps of the method . . . . . . . . . . . . . . . . . . . . . . 9

2.4 The Principal Component Analysis -PCA- method . . . . . . . . . . . . . 10

2.4.1 The goal of this method . . . . . . . . . . . . . . . . . . . . . . 10

2.4.2 Mathematical Backgrounds . . . . . . . . . . . . . . . . . . . . 10

2.4.3 Main steps of the method . . . . . . . . . . . . . . . . . . . . . . 18

3 Implementation and explanation 21

3.1 simple subtraction method . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.1 Realization of the method . . . . . . . . . . . . . . . . . . . . . 21

3.1.2 Conclusion on the method . . . . . . . . . . . . . . . . . . . . . 21

3.2 Histograms of oriented gradients method . . . . . . . . . . . . . . . . . . 21

3.2.1 Step 1: Gradient magnitude calculation . . . . . . . . . . . . . . 22

3.2.2 Step 2: Gradient orientation calculation and magnitude threshold . 24

3.2.3 Step 3: Gaussian filter operator . . . . . . . . . . . . . . . . . . 26

3.2.4 Step 4: Euclidian distance comparison . . . . . . . . . . . . . . . 27

3.2.5 Step 5: Establish a comparison matrix . . . . . . . . . . . . . . . 303.2.6 Problems encountered . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.7 Conclusion on the method . . . . . . . . . . . . . . . . . . . . . 39

3.3 PCA or Eigenfaces method . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.1 Step 1: Realize the database . . . . . . . . . . . . . . . . . . . . 40

3.3.2 Step 2: Subtract the mean . . . . . . . . . . . . . . . . . . . . . 42

3.3.3 Step 3: Calculate the covariance matrix . . . . . . . . . . . . . . 42

3.3.4 Step 4: Eigenvectors and eigenvalues calculation, and choose the

good eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4


6/121

CONTENTS 5

3.3.5 Step 5: Realize the new Data set and compare . . . . . . . . . . . 43

3.3.6 Conclusion on this method . . . . . . . . . . . . . . . . . . . . . 44

4 Tests, results and analyze 45

4.1 The application: Rock Paper Scissors Game! . . . . . . . . . . . . . . . 45

4.2 Test and choices of the parameters . . . . . . . . . . . . . . . . . . . . . 50

4.2.1 Choice of the size of the derivative filter and the number of box

for the gradients method . . . . . . . . . . . . . . . . . . . . . . 50

4.2.2 Choice of the number of pictures and the size of images for the

data set for both methods . . . . . . . . . . . . . . . . . . . . . . 55

4.3 Last tests to explain the efficiency of each method . . . . . . . . . . . . . 60

4.3.1 First tests: Recognition percentage of each method in general . . 614.3.2 Second tests: Recognition percentage of each method in different

conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Conclusion: advantages and drawbacks 74

A Tables of general tests 78

B Tables of specific tests 85

C Script of the game 90

List of figures 118

Bibliography 120


7/121

Chapter 1

Project Aims

We can define the goal with a simple question: How could we command some different

applications just by a singular hand gesture? The aim of my final year project is to an-

swer this question by studying different methods that allow you to realize a hand gesture

recognition. Moreover, the recognition has to be done by one camera and in real time, so

that you can operate as fast as you want to.

To begin, we had the idea to realize a simple subtraction between two images, pixel

per pixel to compare them. We will see in the second Chapter the results of that.

Then we studied a method that used Gradients. The aim is to build the orientationhistograms of the different pictures and to compare them. We will take a closer look

on this method in the chapter 3.

After that, we implemented a method called PCA (principal Component Analy-

sis) or Eigenface. The goal is to calculate, find and study the eigenvectors of the

different pictures and then to express each image with its principal components

(eigenvectors). The difficult part was to find a way to compare the images through

the expression of them with the eigenvectors (as it is done in the Eigenface -face

recognition-)

Last we created a small application to illustrate the different working methods.

6


8/121

Chapter 2

Theory and backgrounds

Before explaining the theory of the different methods, we will just say the main idea of

the methods.

In fact, we realized a database of different hand gestures and we labeled all the data set

pictures so that each picture is classified. Then, the aim is to compare an unknown image

with an image of the database and identify the label by taking the nearest image label

back.

Therefore, we will see in the first part how we choose our database and how we defined

it.

2.1 The database

In this section, we will take a closer look on the database.

At the beginning, we had two main questions about the creation:

Which hand gestures should we choose?

How many pictures of each gesture should we take?

With these questions we could answer by two different kind of database for the samegestures choose:

Take lots of pictures of the different hand gestures to realize a huge database, so that

the recognition will be better (and it is a way to reduce the limits of the different

methods: we will have much more chances to find an image in the database that

looks like the gesture to analyze ). The problem is that it will be longer to look for

the pictures in the database during the comparison.

7


9/121

CHAPTER 2. THEORY AND BACKGROUNDS 8

Take few pictures of the different gestures to realize the database quickly. Then it

will be easier to create the database for the user, and it will be quicker to look for apicture in the database (during the comparison). The problem is that the recognition

will be harder if the gestures are similar.

Therefore, during the project, we begun by taking 5 positions of the hand ( 1, 2, 3, 4

and five fingers) and lots of pictures in the database. We had good results, but the calcu-

lation time was huge. Therefore, we decided to change the database creation by taking a

minimum of pictures of new hand gestures (3 positions that were really different: scissors,

paper and rock).

We will study later the results returned. We will see in details why we change the positions

and the database. What is important now is to understand with which kind of database we

realized this work.

2.2 The simple subtraction method

The aim of this method was to try a simple way to compare images and then to explain and

justify why we had to implement other methods. We will not take a deeper look on this

method here. The theory is really simple: subtract the different images pixel per pixel,

and then compare the results and show the closest one.

We will see the different tests in the last chapter (4) and a short sum up oh this method inthe next chapter about implementation (3).

2.3 The Gradient based method

In order to study the hand gesture recognition, we will study the theory of the gradient

orientation histogram method. In this section, we will take a closer look to define the aim

of this method and the main steps to implement it.

2.3.1 The goal of this method

First of all, the aim of this method is to recognize different hand gestures (without cap-

tors). These hand gestures must be clearly identified in order to command any kind of

applications.

The theory of this method is to study the gradients in the image and to analyze it to realize

an orientation histogram. Then the goal is to compare the histogram to return the label of

the nearest image.


10/121


2.3.2 Main steps of the method

In order to implement this method, we begun by reading some different articles on the

subject, which are really interesting and useful to know the directions to go to: [25] and

[24].

We can then split up into its main parts the method:

First of all, we had to implement the gradient magnitude calculation. The aim is

to define where in the picture the biggest gradient magnitudes are. Then, it will be

easy to apply a threshold on the gradients in order to keep the really interesting one

and to cut all the background noise. To realize this part, the theory is to calculate

the magnitude with the formula:

magnitude =

dx2 + dy2

Therefore, we have to calculate the derivative of the image in x and y to have themagnitude. We will have to choose a size for the derivative filter (in any case, we

will choose a circle derivative filter).

Then, we implemented the gradient orientation calculation. The goal is to realize an

histogram cut in 36 bins (each 10 degrees) or more (we will study the influence later

in the chapter 4). To realize this histogram, we will have to calculate the gradient

orientation defined by the formula:

orientation = Arctan(dy/dx)

Therefore, with this formula, it will be possible to know the orientation of the gra-

dients in the image. We can see that for both magnitude and orientation we will

need the derivative of the image.

With this histogram, we can then have a vector of gradient orientations, which is

defining the picture quite good. So, this second step is the part that will allow

us to compare the images between them. It is a way to define the form with an

appropriate precision.

Also, we had to realize a Gaussian filter to blur the image and have an homoge-

neous picture. It will permit to obtain better results in the gradient magnitude and

orientation calculation. The goal of this filter is to erase the background defects. We

can say that for this method, it is really important to have an uniform background

to avoid noise. To make the background more uniform and to erase white pixels,

we realized this filter. It will permit to obtain better results.

We created a gradient magnitude threshold which had to erase the lower levels

gradients in order to keep the really interesting ones. That will cut all the noise


11/121


and regularize the background. This part will be complementary with the gaussian

filter. The gaussian filter will blur the big defects (but it will still be there), and thethreshold will cut the lowest magnitudes. Then the noise will be quite well cut.

Then, the next step was to calculate the euclidian distance between the vectors of

the different images analyzed. This part is made to compare the different pictures,

by comparing the different histograms. This is the final step. With this, we are able

to recognize the different gestures.

To conclude, we can say that this method does not require special mathematical back-

grounds. Therefore, once we understood the main way to realize it, we just had to imple-

ment it (3).

2.4 The Principal Component Analysis -PCA- method

In this section, we will still study the hand gesture recognition, but we will need some

mathematical background to understand what we made. This method is called: PCA or

Eigenfaces.

So, we will take a deeper look to understand the mathematical backgrounds, the aim of

this method, and the principal parts for realization.

2.4.1 The goal of this method

The Principal Components Analysis (PCA)will also be used for our gesture recognition.

It is a useful statistical technique that has found application in different fields (such as

face recognition and image compression). This is also a common technique for finding

patterns in data of high dimension too. Before realizing a description of this method, we

will first introduce mathematical concepts that will be used in PCA. Here, we will speak

about standard deviation, covariance, eigenvectors and eigenvalues. This background

knowledge is made to make the PCA section easier to understand, but can be skipped

if the concepts are already known. There are examples all the way through this kind of

lesson to illustrate the concepts explained.

2.4.2 Mathematical Backgrounds

This section will attempt to give the elementary mathematical background that will be

required to understand what is the Principal Components Analysis. We will try to realize

a kind of sum up of the principal knowledge used in the PCA method. Each parts is inde-

pendent from the other. We can notice that the goal of that is to understand the principal


12/121


lines of the method and especially to understand why this method is used and what signify

the results returned. We will not use all the backgrounds knowledge described here, butthe different section will provide the grounding of the main skills required.

Therefore, we will first take a quick look on Statistics, and especially on the spread of

data and on the distribution measurements. Then, the other section is on Matrix Algebra

and looks at eigenvectors and eigenvalues (important properties of matrices that are more

than fundamental to PCA).

Statistics

What we will see about statistics is how to analyze a big set of data and how to find and

understand the relationship that we have between the elements of the data set. In this

section, we will take a look on the measurements we can perform on a data set and what

they tell us about the data.

Standard deviation First of all, we will see closer what is the Standard deviation. In

statistics, we generally use samples of population to realize the measurements. The results

returned on this sample will permit to have an overview of the possible and most likely

results that we could have if we make the same test on the entire population. Therefore,

we just extend the sample results to the entire population. To explain it clearly, we will

create a data set and assume that it is just a sample of a larger data set (it is not used in

our project, but it will help us to understand easily the concept).

Here is an example set:

X = [1 2 4 6 12 15 25 45 68 67 65 98]

For the notation, we will use the symbol X to refer to the entire sample and we will

use the symbol Xn to indicate a specific data of the sample. Therefore, X3 refers to the3rd number in X (we can notice that X1 will be the first data and not X0). Therefore, withthis kind of samples, we can realize many calculations that will give us information about

the set. For example, we can first calculate the mean of the set. As it is really simple, we

will just give the formula of it but we will not describe further.

X =

ni=1 Xi

n

It is important to note that we will call X the mean of the set X. The mean of the dataset will not give us so many indications, apart of the middle point.

For example, we can have the same mean for two really different data sets. Therefore, we

will see what is important to better define the data sets below:

[0 8 12 20] and [8 9 11 12]


13/121


Here, what is really different between the two sets is the standard deviation. This

is a way to measure the spread out of the data in a set. Here is the definition of thestandard deviation: This is the square of the add of the average distance from the mean of

the set to the point, divided by n 1, when n is the number of points in the set. Here isthe formula:

s =

ni=1(Xi X)2(n 1)

Where s is the usual symbol for standard deviation of a sample.

We can wonder why we are dividing the sample by n 1 and not by n. We will not giveany explanations of that here, because it would be too long to explain, and it is not impor-

tant for our project. But what is important to remember is that when we use a sample of

a population and that we want an approximation results for the entire population, then we

will have to use n 1. But if we calculate the standard deviation on the entire populationdirectly, then we will have to use n instead ofn 1. We can find further information onthe web site http://mathcentral.uregina.ca/RR/database/RR.09.95/weston2.html

This page is explaining a bit more about standard deviation and about the differences

between the denominators choice. It also gives interesting experiments which are well

describing the difference between the samples or population used and therefore on the

denominators choose.

We will draw tables of the standard deviation calculation for the 2 sets written upper.

Set 1:

X (X X) (X X)2

0 -10 100

8 -2 4

12 2 4

20 10 100

Total 208Divided by (n1) 69,333Square root 8,3266

Set 2:


14/121


Xi (Xi X) (Xi X)2

8 -2 49 -1 1

11 1 1

12 2 4

Total 10

Divided by (n1) 3,333Square root 1,8257

As expected, the first set has a much bigger standard deviation as the second one. In-deed, the first data set has really spread out data instead of the second one.

We can just watch quickly another set, which will have a standard deviation of zero:

[10 10 10 10]

Here, the standard deviation is equal to zero, although the mean is still of 10. This is

because all the points are the same so the data are not spread out. None of them deviate

from the mean.

Variance Variance is another measure of the spread out of data in a set. In fact it is

quite the same as the standard deviation.

We can take a look on the formula:

s2 =

ni=1(Xi X)

2

(n 1)

We can notice that this is just the square of the standard deviation (thats why the sym-

bol s2 is used).Usually, we use the symbol s2 for the variance of a sample. The variance is just another

way of measuring the spread out of data in a sample. We can say that the variance is lessused as the standard deviation. In fact, the variance will be useful for the next section

which is the covariance.

Covariance The covariance will differ from the two first measurements explained in

the upper sections on one principle way: the covariance is a 2-dimensional measurement.

The covariance is a really important knowledge for the PCA method, because we will


15/121


need this calculation later.

So, the calculation of standard deviation or of variance will be useful in the case of onedimension data set, like the set of the marks obtained by all the ENSAM students for their

FYP (Final Year Project). But, for the PCA method, which will deal with more dimen-

sions, we will need the covariance and not the variance knowledge.

The covariance will allow us to see if there are any relationship between the different

dimensions of the data set. For example, we could realize a 2-dimensional set of the

marks obtained by the ENSAM students and their age. Then, we could see if the age has

an effect on the mark received by the student. It is exactly the kind of test that we could

perform with the covariance (We can yet imagine where we want to go with that in our

project: watch if our different pictures are in relations or not).The covariance formula is really near from the variance formula. We can write the vari-

ance formula like this, to better understand the covariance formula:

var(X) =

ni=1(Xi X)(Xi X)

(n 1)

Now we can take a look on the covariance formula:

cov(X, Y) =

ni=1(Xi X)(Yi Y)

(n 1)

We can just notice that if we try to calculate the covariance between a dimension anditself, we will get the variance.

In fact, we just replace the second part of the formula with the second dimension to ana-

lyze to obtain the covariance formula!

We can also say that it is possible to calculate the covariance between more dimensions

than two. We can calculate covariance on three dimensions for example. The lonely thing

to know is that we will calculate 9 covariances between dimensions (2 by 2) and then

create a matrix (called the covariance matrix), that will be 3 3, in case of three dimen-sions. In fact, the diagonal will be the result of the variance for each dimension and the

other terms will be the covariance between terms (for example, line 2 column 1 will be

the covariance between the y and the x dimensions). By the way, we can notice that thecovariance is commutative (we can easily replace each dimension per the other without

changing the results). Therefore the covariance matrix will be symmetrical.

Then, we can get lots of really important information with the covariance calculation.

In any case, it is important to notice that the value returned will not be as important as the

sign returned.

Indeed, if the result is positive, that will mean that the two dimensions increase together


16/121


(For our example on the ENSAM students -marks received and age- ) this will mean that

the mark increases when the age increases.And if the result is negative, then it will mean that when one dimension is increasing, then

the other is decreasing.

Last case, the result returned is null. That will just mean that our 2 dimensions do not

have any kind of relations between them. They are independent.

Therefore, the covariance calculation can bring us really important indications on the

set of data we are studying. With it, we can then represent the covariance between 2 di-

mensions in a graph to get an idea of the relation that exists between them.

Of course, it will not be possible to represent the covariance when our data set will have

more than 3 dimensions.Although the covariance can just be calculated between two dimensions and it is not

possible to represent the relationship between the data when we get more than 3 dimen-

sions, the covariance is often used for big data set with many dimensions. Indeed, we

can calculate the relationship between the dimensions and have some exploitable results.

Moreover, it will be pretty hard to visualize the relationship between dimensions when

we have a huge data set with many dimensions without the calculation of the covariance.

Therefore, the calculation of the covariance will bring us lots of help to see the relation-

ships between dimensions in a data set like we have in our project.

The covariance matrix Recall that covariance is always measured between 2 dimen-

sions. If we have a data set with more than 2 dimensions, there is more than one covari-

ance measurement that can be calculated. For example, from a 3 dimensional data set

(dimensions x, y ,z ) we could calculate cov(x, z), cov(y, z)...In fact, for an n-dimensional data set, we can calculate n!

2(n2)!different covariance values.

The other will be the variance in the diagonal.

A useful way to get all the possible covariance values between all the different dimen-

sions is to calculate them all and put them in a matrix. Lets have a quick overview of the

definition for the covariance matrix for a set of data with n dimensions:

Cnxn = (ci,j, ci,j = cov(Dimi,Dimj))

Where Cnxn is a n by n matrix (n rows and n columns), and Dimx is the xth dimension.

We can here notice that the covariance matrix will be square in any case, and that each


17/121


part of the matrix is the result for a covariance calculation between two dimensions (ex-

cept for the diagonal as said before).

For example, we will build the covariance matrix for a 3 dimensional data set, using the

usual dimensions x , y and z. Then, as the matrix is square we will have the values below:

C =

cov(x, x) cov(x, y) cov(x, z)cov(y, x) cov(y, y) cov(y, z)

cov(z, x) cov(z, y) cov(z, z)

As said earlier, the matrix will be symmetrical, and the diagonal will be the variance

calculation. Therefore, we can say that the matrix will have this form:

C =

var(x) cov(x, y) cov(x, z)cov(x, y) var(y) cov(y, z)

cov(x, z) cov(y, z) var(z)

Therefore, we will have 6 terms to calculate for the 9 terms.

Matrix Algebra

This section is made to provide a background for the matrix algebra required in PCA. We

will especially take a closer look at the eigenvectors and eigenvalues of a given matrix.

Lets see an example of matrix:

2 32 1

32

= 4

32

For example, 4 is an eigenvalue of the matrix.

Eigenvectors First of all we will give the wikipedia definition of an eigenvector:In linear algebra, the eigenvectors (from the German eigen meaning inherent, character-

istic) of a linear operator are non-zero vectors which, when operated on by the operator,

result in a scalar multiple of themselves. The scalar is then called the eigenvalue associ-

ated with the eigenvector.

As we can see in the example upper, the results of the multiplication between a vector

and a matrix returns exactly 4 times the beginning vector. We have here an example of


18/121


eigenvector. We will try to explain this example to better understand the eigenvectors.

The vector is a 2-dimensional one. The vector

32

represents an arrow going from the

origin (0, 0) to the point (3, 2). The matrix

2 32 1

can be imagined as a transformation

matrix. Therefore, if we multiply this matrix with a vector, the result returned will be

another transformed vector. If this transformed vector is just a multiplication of itself

by a scalar, then it is an eigenvector and the scalar will be the eigenvalue associated to the

eigenvector.

Now we will try to see the different properties of these eigenvectors:

First of all, we can just find eigenvectors for square matrixes. We can also say that

not every square matrixes do have eigenvectors. In the case they have, then they can

not have more eigenvectors than their dimension (for a 3 3 matrix, the maximumnumber of eigenvectors is 3).

You can multiply an eigenvectors by a scalar, it will still be an eigenvector (because

we just change the length and not its direction).

All the eigenvectors are orthogonal between them, no matter the number of dimen-

sions.

Most of the time, the returned eigenvectors are normalized (norm = 1). It will bethen easier to exploit.

We can find further information on eigenvectors on the web site:

http://www.mathphysics.com/calc/eigen.html .

Eigenvalues Each eigenvector is associated to an eigenvalue. The eigenvalue could giveus some information about the importance of the eigenvector. The eigenvalue are really

important in the PCA method, because they will permit to realize some threshold to filter

the non-significative eigenvectors, so that we can keep just the principal ones.

MATLAB will return the eigenvalues and the eigenvectors of the covariance matrix with-

out any problem.


19/121


2.4.3 Main steps of the method

Finally we arrived to Principal Components Analysis (PCA), the interesting part of our

project. We could first answer a question: What is it exactly?. We can answer that it

is an algebraical way to compare images by compressing the set of data and highlighting

the principal components of the set.

The main advantage of PCA is that once we have found the principal components of

the set, which express pretty well the data, we can take back the beginning data (images

in our case) with a low loss, even if the compression is really high!

In this section, we will try to explain how we went through the problems to realize this

method to make gesture recognition. Therefore, we will describe the work made step by

step, to understand each part of the work.

We can then split up the method into its main parts:

First of all, we had to create the data set. Indeed, we had to take some pictures of

the hand that could do the database for the PCA recognition. The aim is to choose

a good number of pictures and a good resolution of these in order to have the best

recognition with the smallest database. Then, the aim is to make the database. To

create it, the theory is to transform all the pictures in a simple vector, which willhave a dimension of the number of pixels. Then, we create a matrix where each line

is an image-vector... The results for 12 pictures and a 640 480 definition will be a12 307200 matrix.

Then, the next step is to subtract the mean from each of the data dimensions. The

mean subtracted is simply the average across each dimension.For example, for a

three dimensions x, y and z, we will have to subtract x from x, y from y and zfromz. The aim is to center our set in the space of all the dimensions (we will see laterfurther explanations of the different spaces used, but what is important to remember

here is that we have to subtract the mean to center our set of data).

The step three is to calculate the covariance matrix of the database. It will be quite

difficult in our case, because the data set is really huge! So we have found a method

to simplify this calculation. We will explain the method:

Indeed, we can not calculate the covariance matrix of the first matrix , because it

will be too huge. So we had to find a way to find out the principal eigenvectors

without calculating the big covariance matrix.

I have found the solution in a paper written by M. Turk and A. Pentland. [ 23]


20/121


The method consists in choosing a new covariance matrix.Indeed we will call our second matrix (all the images with the mean subtracted)

12 307200 :A. Our training set of image will be B1, B2, B3...B12 with dimen-sions l c. M is the average of the whole set of pictures. As seen earlier, wetransform each image in a vector of l c dimensions. So, we can say that our pic-ture is a point in a l c dimensional space. Therefore, our 12 images represent 12points in this space. But, as we centered the set (by subtracting the mean), each

picture is not so far from the other in this space (because they are quite similar at

the end). Therefore, it is possible to express our data set with less dimensions.

Our covariance matrix for A will be called C and C is defined by:

C = A AT

Then, the eigenvectors and the eigenvalues of C will be the principal componentsof our data set. But as explained before, we can not calculate C.The idea is to say that when we have 12 points in a huge space, the meaningful

eigenvectors will be less than the dimension, and the number of the meaningful

ones will be the number of points minus 1. So in our case, we can say that we will

have 11 meaningful eigenvectors. The remaining eigenvectors will have an eigen-

value around zero.

Fortunately, it will be easier to calculate the eigenvectors of a 12 12 matrix thanfor a 307200 307200 matrix!We will name the eigenvectors of the covariance matrix AT A, vi and its eigenval-ues ki.We can then write:

AT A vi = ki vi

Then we multiply both side per A:

A AT

A vi = A ki vi = ki A vi

We can see that A vi are the eigenvectors ofC = A AT. Now , we will construct

a new matrix L = AT A, and we will find the l eigenvectors vl ofL.These vectors determine linear combination of the 12 training set images to form

the eigenpictures of our set.

So, with this subtlety, we will have a small covariance matrix to calculate : 12 12instead of307200 307200! The calculation will also be much faster and the eigen-


21/121


vectors returned are the principal one.

Then, we will calculate the eigenvectors and the eigenvalues of the covariance ma-

trix. This will give us the principal orientation of the data. MATLAB will do it

easily.

After that, we have to choose the good components and form the feature vector.

This is the principal step. We will have to choose the principal (most important)

eigenvectors with which we can express our data with the lowest information loss.

We also have to choose a precise number of eigenvectors to have the less calculation

time, but the best recognition. Here, the theory says that we will normally have 11

meaningful eigenvectors.

Last, the final step is to make a new data set (that we will call eigenset). Then, it

will be possible to realize the last script which could compare the different pictures

and class them by resemblance order. To compare the different pictures, we will

have to express each image of the data set with these principal eigenvectors. The

last thing to do is to compare (by calculating the euclidian distance between the

coefficients that are before each eigenvector).

To conclude, we can say that we will need more mathematical backgrounds for this

method. Then, once the theory is well understood, we can implement this method on

MATLAB too. In the next chapter, we will take a closer look on the implementation of

the different methods.


22/121

Chapter 3

Implementation and explanation

3.1 simple subtraction method

3.1.1 Realization of the method

We will not take too many times to explain how we realized this comparison, because it

is really simple to create and the results are really bad. Therefore, what is important to

notice here is:

Before doing the subtraction, we applied some adjustments on the contrast first, and

then we applied a blurring filter to erase the background imperfections.

We performed some tests on this method that we can see in the last chapter 4). This

figure shows the efficiency of this method : 4.14.

This method confirms the idea that we should implement other methods, because

the results returned are really bad and we can say that it does not work properly.

3.1.2 Conclusion on the method

In any case, it is a good thing to know what are the results for this method. It was our

first idea and it has confirmed us that we had to look further in the image analysis, by

implementing other methods.

3.2 Histograms of oriented gradients method

In this section, we will explain how we implemented this method, and the problems en-

countered. We will try to understand each part of the method and why it works or not

21


23/121

CHAPTER 3. IMPLEMENTATION AND EXPLANATION 22

works in the different cases.

3.2.1 Step 1: Gradient magnitude calculation

After a first approach of the MATLAB software, we realized the first script which had to

calculate the magnitude of each gradient in the image.

Magnitude gradient definition:

If dx and dy are the outputs of the x and y derivative operators, then the gradient

magnitude is calculated by:mg =

dx2 + dy2 (3.1)

Therefore, in order to calculate the gradient magnitude, we had first to calculate the

derivative dx and dy of the image.

The script of the derivative operator has been found on internet, but we can see what

it looks like to understand the following steps:

X-derivative operator script:

explanation of how to use the script:

function d = xDeriv(im, xRadius, yRadius, shape)

XDERIV Returns the X-derivative of image im.

D = XDERIV(IM, XRADIUS, YRADIUS, SHAPE)

IM - Input image.XRADIUS - half the width of the vicinity in which the

derivative is calculated.

YRADIUS - half the height of the vicinity in which the

derivative is calculated (default: equal to XRADIUS).

SHAPE - Either of:

full - (default) returns the full 2-D convolution,

same - returns the central part of the convolution

that is the same size as A (the default).


24/121


valid - returns only those parts of the convolution

that are computed without the zero-paddededges, size(C) = [ma-mb+1,na-nb+1] when

size(A) size(B).

Written by Ariel Tankus, 19.9.96.

Therefore, this script can calculate the derivative matrix by entering the image refer-

ence, the xradius, the yradius and the returned shape you want.

So, with the same script for the y derivative, we could have the gradient magnitude reallyfast. We just had to know about the speed of the running process.

We also implemented the gradient magnitude script as described below:

Gradient magnitude operator script:

function(mag,dx,dy) = grad(im, xRadius, yRadius, shape)

GRAD Return the gradient magnitude of the given image.

(MAG, DX, DY) = GRAD(IM, XRADIUS, YRADIUS, SHAPE)

IM - image

XRADIUS - half width of derivation vicinity.

YRADIUS - half height of derivation vicinity.

SHAPE - either of same, valid, full. See xderiv.

MAG - Gradient magnitude.

DX - X-derivative (optional).DY - Y-derivative (optional).

The outputs are for MAG:

M AG =

min(min(

dx2 + dy2)),max(max(

dx2 + dy2))

(3.2)


25/121


The MAG output returns a two-dimensional vector which has the minimal and the

maximal term of the gradient magnitude matrix. The aim is to relieve the calculation witha smaller output.

The DX and DY outputs return two matrixes, which have both the size of the image ma-

trix.

What we just needed as the outputs is the minimum and the maximum of the gradient

magnitude in order to realize an efficient threshold (to cut the lowest gradient magni-

tudes). But we will come back on this part later (3.4).

Then, having the minimum and the maximum gradient magnitude of the picture, we

could go through the second part: The gradient orientation calculation.

3.2.2 Step 2: Gradient orientation calculation and magnitude thresh-

old

Once we have made the gradient magnitude calculation, we implemented the second script

which had to calculate the orientation of each gradient (which is enough important) in the

image.

Gradient orientation definition:

If dx and dy are the outputs of the x and y derivative operators, then the gradientdirection is calculated by:

dir = arctan (dy/dx) (3.3)

Therefore, in order to calculate the gradient direction, we had just to use the derivative dxand dy of the image, that we already calculated for the magnitude.

Then, after having applied a threshold on the gradient magnitude, we had to sort out all the

measurement in a 36 dimensions vector. We made 36 bins (10 degrees each). And after

having the vector, we plot it in polar and cartesian coordinates, just to have an overview

of the orientation.

We will describe the script wrote to realize this implementation:


26/121


Gradient orientation operator script:

function(Z) = grador2(im, xRadius, yRadius, shape);

This function returns the 36 dimensional gradient orientation vector of the given im-

age.

(ORI, DX, DY) = GRADORIENTATION(IM, XRADIUS, YRADIUS, SHAPE)

IM - image

XRADIUS - half width of derivation vicinity.

YRADIUS - half height of derivation vicinity.SHAPE - either of same, valid, full. See xderiv.

ORI - Gradient orientation matrix (contains all the gradient directions).

(gm,dx,dy)=grad(im, xRadius, yRadius, shape);

We call the gradient magnitude operator. It will return the X-derivative, the Y-derivative

and the 2 dimensional magnitude vector (to have the maximum and the minimum of all

the gradients).

gm = ((gm(1) + gm(2)) 0.1) + gm(1) (3.4)

This is the threshold number. It is relative to the image and fixed at 10 percent of the

scale (between the minimum and the maximum gradient magnitude).

Then, we defined that when we have less inputs than 3, we will consider the shape as

same and the yradius equal to theyradius.

The next step was to create a 36 dimensional vector which was full of zeros. Then,

we just had to increment each bin when an orientation is found.We had to take care that

the arctan function is just available from 0 to Pi.

After that, we had to create the gradient direction matrix, with all the pixels gradient

directions. We had to take in consideration the threshold to have the main gradient mag-

nitude orientation.

We will see in the section 3.2.6 that we have some problems with the borders of the

pictures. Therefore, we will have to take in consideration this problem and we will cut

the new image border. It is relative to the input and will avoid the high level gradient

magnitude which are calculated on the different borders of the pictures.


27/121


So after having realized the gradient orientation vector of each image, it will be possi-ble to realize comparison between the different images.

Then, to see the histograms, we will have to display it by showing the cartesian and

the polar representation of the orientation vector.It is good to have the two histograms to

see accurately where the orientation peeks are.

So, we have seen how to calculate the gradient orientation vector of a picture. With

the different gradient orientation vectors of the data set, it will be possible to compare the

images between them and so to sort them out.

We can take a look on the histograms form on the figure 3.1.

Figure 3.1: Representation of the orientation histograms for each new position Onthis graph, each histogram is drawn under its picture.

3.2.3 Step 3: Gaussian filter operator

After having lots of problems in our orientation vectors (too much noise), described in the

paragraph 3.2.6, we decided to realize and apply a gaussian filter on the picture. We will

now detail the way we realized a gaussian filter on MATLAB, in order to blur the image


28/121


and to erase unwanted high level contrasts. We can notice that the gaussian filter function

is also in the Image processing toolbox. Therefore, it was useful at the beginning of theproject (we did not have this toolbox), but then we used the direct MATLAB function.

Gaussian filter operator script:

This script will just return the filtered image of the given one.

The aim of this script is to balance each pixel in function of the other. Therefore, we just

have to put a weight on each pixel around the one to be blurred. After that, a white pixel

on a black background will become dark gray. We decided to make a circle filter of three

pixels around. We choose to balance with this numbers below for the different weight, asseen on he table 3.2.

Figure 3.2:

In order to have a good filter, we can change the value of the filters. But, with these

values, the picture will be well blurred and it will erase a large part of the background

noise without having a too strong filter.

3.2.4 Step 4: Euclidian distance comparison

In this section, we will see how we realized the comparison between images, in order to

have a realistic and efficient gesture recognition.

We have used the euclidian distance on the gradient orientation vector (36 dimensional

vectors) to calculate the difference between two pictures.

The euclidian distance comparison:


29/121


In order to obtain an efficient comparison between images, we decided to calculate

the euclidian norm between the orientation vectors of the picture that has to be recog-

nized and the data set pictures (We just calculated the euclidian distance of the gradient

orientation vector between the analyzed picture and each database picture. We then sorted

it out and selected the four smallest one). Therefore, we implemented a script that had to

do this calculation between all the gradient orientation vectors.

Here is the script we made:

The vector-vector comparison script:

function(disteuclid)=fini2(b, Im2, xRadius, yRadius, shape);

Returns the 4 nearest database pictures of the analyzed image.

First we created a 25 dimensional euclidian distance vector, where each dimension is

a result of the comparison with a database picture.

Then, we had to use the gradient orientation script to calculate the the orientation vector

of the picture to analyze.

After that, we returned the index of the 4 smallest vector terms in the 25 dimensional

euclidian distance vector. With these terms it will be easier to find the images correspond-

ing to these indexes and then to take back the label of the image and to display it.

The script did work really well, but the executing time was really to high. It needed

180 seconds to give the 4 nearest pictures. So, the new goal was to considerably reduce

this time in order to have a quick answer to a done picture. We must not forget that the

final aim is to have a real time application!

The idea was to realize a script that could create a matrix (MATLAB works faster withmatrix) of all the gradient orientation database vectors and to save it in a text file. Then,

it would be easier to compare the gradient orientation vector of the analyzed picture with

each line of the database matrix, which we will call MATDIST (see C). That would cut

all the calculation time for all the database pictures. Indeed, We did not need to calculate

each time all the database pictures.

We can now see how we made the code to create the MATDIST:


30/121


The database matrix operator:

function(matdist) = matdist(Im2, xRadius, yRadius, shape)

This function returns matrix of all the gradient orientation vectors of the pictures

stored in the database.

We decided to use the tic underlineMATLAB function to launch a chronometer. Thegoal is to know the calculation time for the matrix creation (we took 26 images for the

data set). It is a good function to know the efficiency of the algorithm we made. More-

over, we can then easily know the time we won with the different changes we made.

We just had to add a matrix creation in the loop (which will be 26 lines for the 26 images

and 36 columns for the 36 orientation bins). This new matrix will just be the combina-

tion of all the orientation vectors. Each line will be the gradient orientation vector of a

database picture.

Once the matrix is created, we just had to save it in a specified folder so that we can load

it whenever we want.

The time to execute this script to create the database matrix is around 70 seconds and

you just need to run it one time. With this script, we have won the time we wanted. Now,to operate and recognize a picture, we will run another script that compares the analyzed

image with this matrix.

We can now explain and comment the new script encoded:

The vector-matrix comparison operator:

function(disteuclid) = fini3(b, xRadius, yRadius, shape)

This script returned a window displaying the 4 nearest images with the euclidian distanceassociated between each image and the compared one.

First of all, we began by loading the MATDIST.

Then we just had to calculate the orientation vector of the image to analyze and to com-

pare it with each line of the database matrix.

After that, we had to sort the distance out and to take back the database pictures with their

label: the class is recognized.


31/121


With this script, the time needed to compare the images between them is about 3

seconds. We won around 177 seconds in the execution times (We needed 180 seconds

to make a comparison before)! To immediately test our results, we just had to take a

database picture and to compare it with the database. If everything works, then it should

return the same image in first with an euclidian distance equal to zero. This is the result

we get in any case when we use a database picture. It verifies that in a case of two identi-

cal pictures, the script returns a logical result.

We can see it on the picture 3.3.

On the picture 3.3, we can see that the image returned as the nearest is the image en-

tered in input. This is a way to check the algorithm. Here, the algorithm is well working.

The euclidian distance gives true results. Now, we have to check that the method is good

in recognizing the picture with similar picture in the database.

Now we can have a look on the 4 nearest pictures returned for the 1 position on the

figure 3.4.

On the figure 3.4, we can see that the two first pictures returned are the same gesture.

But the third one is not the same gesture. Therefore, we had to try with others hand ges-

tures and see the results. We will further test the algorithm in the chapter 4.

3.2.5 Step 5: Establish a comparison matrix

In this part, we will see further than just realizing the script. We will try to know why

sometime it is not working as expected and what kind of solutions we could bring to havebetter results.

As we have seen upper on the figure: 3.4, the results are not always as good as expected.

For example, if we ask the result for another hand gesture recognition that is more com-

plicated, we can see the returned answer on the figure 3.5.

We can see that the results expected are clearly not the results given. This problem


32/121


Figure 3.3: Results of the 1st answer of vector-matrix comparison script for a

database picture. We can check that the the first image has a euclidian distance equal to

zero.

comes from the database quality and size, or from the different positions we took. Wehave 26 pictures in the database and we took all of them different to see the problems that

we could have...

We also change the orientation of the hand and the fingers spacing during the hand shoot-

ing, to recognize more positions.

What we can notice is that we have much better results for the position 1. It is just

because the spacing is not influencing the results and we just have the orientation of the

finger that is really acting upon the results. Thats why we have better results with the


33/121


Figure 3.4: Results of the vector-matrix comparison script for a database picture(1).

position 1 picture.

Therefore, a solution would be to widen the database. We can take lots of pictures for

each hand position and then apply the script again. The new problem expected is the run-

ning process time that will be too excessive.

The second solution is to change our gesture position and to choose new one, that are

really different. We will try this possibility after.

To define the way to go to, we tried to identify the problem clearly. Therefore, we decided

to realize a matrix which could show in a gray scale if the different database pictures are

close or not (all the pictures were taken under the same lighting) in terms of euclidiandistance between their gradient orientation vector. Black will mean that the pictures are

really close and white that they are really different.

To realize this matrix, we just had to use the database matrix already done and to com-

pare each line with the others one. Then, MATLAB will display the new matrix in a gray

scale to show the results.

We can see the image of this matrix on the figure 3.6.


34/121


Figure 3.5: Results of the vector-matrix comparison script for a database pic-

ture(position:5).

We can see the database for this gray matrix on the picture 3.7.

The best is to have just white everywhere, but black in the diagonal. Indeed, the white

means that our gesture are really far between them and the black means it is the same

picture. Therefore, we will have black in the diagonal in any case, because it is the com-

parison between two identical pictures.

To verify that our gestures are good between class and in a class, we can plot this gray

matrix and when we have black in the same class and white between the class, that means

that our gestures are perfectly choose.

Here, we can see that our gestures are too close. There is for sure too many dark gray

in the matrix. This shows us that the real problem is our gesture positions. Indeed, the

positions are too close and then the recognition will be too hard to realize.

Therefore, we decided to reduce the number of positions and to take just three really

different positions: Rock, paper and Scissors. It will also allow us to realize an ap-

plication (the well-known little game) to obtain a concrete comparison application.


35/121


Figure 3.6: Returns of the graymat function for a 5 pictures database (1 image per

class).We can see that the diagonal is black: that shows that the matrix is well calcu-

lated.On the diagonal is the euclidian distance between the two same vectors...

We can see the gray matrix of our new gestures on the figure 3.8.

With the observation of this new gray matrix, we can confirm that the new gestures

choose are much better than the others. We have white between the different gestures (that

means that the positions are far between them) and black in the diagonal, as expected. We

can see in the results (Chapter 4) that the recognition will also work far better.

3.2.6 Problems encountered

During the realization of the different steps, we came across different problems that we

will explain in this part. We will not say all the problems we had, but the one were we

lost some time.


36/121


Figure 3.7:

First problem:

The first problem we had, was about the different class in MATLAB. Indeed, MATLAB is

auto defining its term classes and all the function used depend of the class of each element

called in the process.

When we charged the image (MATLAB imread function), the class of the resultant ma-

trix was colorful uint8 (three matrixes) and when we had to calculate the gradient, we

needed a double class gray equivalent element in the matrix. Therefore, we had to write

a small script which could convert a uint8 MATLAB class in a double one (you can

find a direct function if you have the image processing toolbox, but we did not have it at

the beginning of the period as said before, thats why we implemented this small script).Here is the script we wrote to make this transformation:

Then we had to transform it in gray scales. We used these coefficients to have a good

gray scaled image, where A1 is the colored image:A1=((A(:,:,1))*0.3+(A(:,:,2))*0.59+(A(:,:,3))*0.11);

So we changed the three RGB matrixes in one gray equivalent matrix. The coefficients

are chosen to respect the different contrast.


37/121


Figure 3.8: Returns of the graymat function for our new gesture database. We can

see that the diagonal is black and that the other colors are much whiter (1 image perclass).

After that, we could easily calculate the gradient of the image, but with some loss (we

transformed a colorful picture in a gray equivalent picture -three matrixes to one-).

Then, we bought the image processing toolbox, so we could just use the new MATLAB

function.

Second problem:

The second problem was the image border. When we calculated the gradient magni-

tude of the picture, the boarder were included in the calculation with a very high level,

due to the consideration of the XRadius and the Y Radius. We can see it clearly on thepicture 3.9, where we calculated the gradient magnitude on a simple form (white triangle

on black background):

On this figure (3.9) , we can really see the noise of the borders in the picture. The

gradient magnitude operation calculate the borders as a part of the image. That will bring


38/121


Figure 3.9: Gradient magnitude of a triangle with noise on borders

lots of problems later in the second step: Gradient orientation calculation 3.2.2.

Therefore we decided to cut the borders in the gradient magnitude calculation, otherwise

all of our histograms would be similar.

Third problem:

When the background is not completely black and dark, we will have problems with

the reflects and the contrasts for the gradient magnitude calculation. Indeed, we will have

lots of noise that will be part of the gradient orientation vector. Therefore, in order to

avoid this noise, we can apply a gaussian filter on the image. That will blur and soften

the contrast. Then, we will have less high gradient magnitude noise. A complementary

way to avoid this kind of noise is to take the picture on a really black background. We can

see on the different pictures below 3.10 and 3.11 the differences between the gradients

magnitude of each image and the the different histograms returned.

We can also look the gradients magnitude images of these two pictures on the figure

3.11.


39/121


(a) (b)

Figure 3.10: 1 finger picture with black and gray backgrounds We can see the noiseof the gray background (a). That will bring lots of problems on the gradient calculation.

With a good black background (b), we really simplify the problem.

(a) (b)

Figure 3.11: Gradients of 1 finger picture with black and gray backgrounds We can

clearly see the white reflects of the gray background after the zoom (a), compared to the

black background (b), even if we applied an important gaussian filter. The histogram willalso be deteriorated.

We can notice that with a good black background, we have no troubles. We cut all the

background noise. We will see later how we can do to resolve this recognition problem.

Now we will compare the two histograms on the figure 3.12.

To conclude with these few pictures, we can say that having an homogeneous black


40/121


(a) (b)

Figure 3.12: Histograms of oriented gradients of 1 finger picture with black and gray

backgrounds On these two graphs, we can notice that for the black background (b), the

histogram is much more accurate than for the gray one (a). Therefore, it will be easier to

treat. We need to have precise histograms to have a good gesture recognition

background will make the work easier.We will have much more precise histograms, and

the recognition result will be far better.

3.2.7 Conclusion on the method

After having implemented this method, we understand much more about images and what

is scientifically behind an image. We can see that the results obtained are good but we

could have think that they will be better. By implementing the next method, we will surely

have new ideas to make this method more efficient. We will then test the method in the

chapter 4.


41/121


3.3 PCA or Eigenfaces method

We will now explain each part and detail the way we made the different algorithms.

3.3.1 Step 1: Realize the database

First of all, we had to choose how to make our database and what kind of database would

be the best for the recognition. We choose to take the minimum of pictures to have the

best recognition.

It is important to notice that for the Eigenface method, we work with the entire pictures at

beginning. Then we reduce the datas (our aim is to express the data set with less factors).Therefore, we must take care of the data set to decrease the calculation time. There are

two parameters to include to realize the database:

The number of pictures.

The size of each picture, which will be part of the size of the first matrix to reduce.

Both are really important. Indeed, when we will create the first matrix to reduce, its

size will be the number of pixels by the number of pictures.

Therefore, if there is too many pictures or too many pixels in each picture, the calculation

time will grow fast!

For example: 10 images with a definition of 640 480 will give a matrix, which size is10 307200. and 10 images with a definition of1280 960 will return a matrix size of10 1228800.So, as you can see here, it can easy and fast become a really huge matrix. The calculation

time will then hardly depend of that. Moreover, we must not forget that MATLAB can

not manage such big matrixes too.

Therefore, the question of the database is a really important question, because it will

then determine the efficiency of the method (and its calculation time too).

At the beginning, we could not know how many pictures we had to take and what size wehad to choose. So we decided to make a database of 12 pictures (4 of each position) with

a definition of640 480

To choose what kind of database would give the best recognition, we realized some tests

(after having implemented the method) of efficiency with different numbers and size of

pictures in entrance.


42/121


43/121


3.3.2 Step 2: Subtract the mean

The next step is to calculate the mean of each direction. It is a fast step. We just had

to take the first matrix of all the images, and then to ask to MATLAB to calculate the

mean of the matrix. Then, we subtracted it to the first matrix. We do not have so many

things to say about this step as it is really trivial. We must nor forget that this part is really

important to center the data set pictures in the space.

3.3.3 Step 3: Calculate the covariance matrix

This step was a bit more difficult than the two first one (we had to well understand the

theory to realize the calculation precisely).

But once we understood the subtlety described in the second chapter 2, the calculation

becomes fast and easy to implement. We can see on the picture 3.14 the different eigen-

pictures returned by this covariance matrix.

Figure 3.14: Example of the eigenpictures of the data set used for the PCA recogni-

tion method


44/121


3.3.4 Step 4: Eigenvectors and eigenvalues calculation, and choose

the good eigenvectors

In this step, we will take a closer look on the calculation of the eigenvectors and eigenval-

ues of the covariance matrix, and how to choose the good one.

Indeed, it is really important to choose the good eigenvectors to express the data set with

the best base. The number of eigenvectors choose will be in direct relations with the re-

sults that we get.

The value of the eigenvalues (between 0 and ) will determine if the eigenvector is

important or not in the expression of the data set in the new space.

Therefore, we thought we would have to realize a threshold on the eigenvalues to keep

the most important eigenvectors. It is sure that it is really important to realize an efficient

threshold to have the best results. At the beginning, we decided here to keep the 11th

first eigenvectors (as described in the theory: it is 11 for 12 database pictures). But then,

we planned to make some tests to know which distribution is the best (we had to find the

value of threshold that give good results and that decrease the calculation time efficiently).

But at the end, we decided to keep just three images in the data set (after having performed

other tests in the chapter 4). Therefore, we decided to keep all the eigenvectors, because

three eigenvectors is in any case really small.

3.3.5 Step 5: Realize the new Data set and compare

In this step, we will have to realize the new data set, by saving the new matrix of the

eigenpictures and expressing each image of the database with the principal eigenvectors

(we just have to realize a scalar product between the eigenvectors kept and the image). We

will then save the coefficients that will be in front of each eigenvectors for each database

image. Therefore, we will have as coefficients as eigenvectors.

At the end, it is a way to express each image with the eigenvectors calculated. We will

then express the image to analyze with these eigenvectors too. With these coefficients,

we will be able to compare the images between them, by comparing the coefficients (we

make the euclidian distance between each image coefficient). The results returned are

quite good, as we can see in the chapter 4.


45/121


3.3.6 Conclusion on this method

After having implemented this method, we understand how we can hardly compress with

low loss a huge set of images. Moreover, we have seen an other view of image analysis

and it is really interesting to compare the two kind of methods. It is what we will do in

the next chapter (4): Tests, results and analyze. We can see that the results returned

are quite good, but we can easily imagine that this method will be better with centered

image, because the position in the picture of the gesture will be really important. We can

understand that the second name for this method Eigenface is not an hazardous name. It

is just that it should better work with faces than hands, because it will be easier to center

a face in the picture (by centering the mouth and the eyes).


46/121

Chapter 4

Tests, results and analyze

In this chapter, we will explain the different tests made and the results returned. It will

give a kind of tutorial of each method and then help people to choose one or the other

method in function of the application they want to create. We will also explain the draw-

backs of each method and the technical reasons of these drawbacks.

So in a first part, we will see the application realized and we will give the complete script

of the application. We will explain how to use the application too. After that, we will see

the different choices made for each method and explain why we made these choices by

performing tests. Then, we will make a simple comparison of the different methods and

draw graphics of the results of each method in different conditions.

4.1 The application: Rock Paper Scissors Game!

After having implemented the different methods to realize the gesture recognition, we

decided to implement a small application which would use the different methods.

The first idea that comes to our mind was to realize a simple game that everybody knows:

The Rock-Paper-Scissors Game!Indeed, it was the best way to test our gesture recognition script with fun.

Moreover, as everybody knows the game, it is really easy and comfortable for other peo-

ple to test the scripts and the recognition level of each method. The application has a GUI

form for a better interface with the player. We tried to make an easy-to-use application,

with very few things to do to realize the database or to play the recognition game.

We can take a look on the Gui form shown on the figure 4.1.

45


47/121

CHAPTER 4. TESTS, RESULTS AND ANALYZE 46

Figure 4.1: Photo of the application realized

We can see on this screen shot the GUI form of the application and the overview of the

different options proposed by the game.

Then, we will see a photo of the environment constructed (PC and web camera environ-

ment) to take good pictures for a better analysis. Moreover, it is important to know which


48/121


environment we used to realize these pictures, because it is in direct relations with the

results returned.The black background for example is really important. We can see that the environment

looks quite basic. Indeed, it will be pretty easy for someone to make its own working

space and use this script.

Then we will explain each button, and what the application can do.

But first, we can watch the working space on the figure 4.2.:

Figure 4.2: Photo of the working space realized for the gesture recognition applica-

tion

We can see on this photo the working space realized for the gesture recognition. We

used a Philips camera, with a tripod. We cut a wood board and painted it in black to

have a better background. We can just say that paint is surely not the best way to havean uniform background, but that was what we had. Indeed, even we took a matt paint,

the different lighting set are directly reflecting on the paint and therefore influencing the

results. Thats why we make some different work on the pictures before analysis. The

best background would be a textile, because it is much more matt.

So far, everything is easy to realize. What is a bit more hazardous is to have a camera

recognized by MATLAB. We were lucky and we had one.

Now we can see the application in details on the figure 4.3. We are going to explain


49/121


each function and why we realized it like that.

Figure 4.3: Explanation of the application realized

So, we will explain more precisely the different buttons and their utility:

The Start/Stop button: As indicated, it is made to start or stop the cam. The aim isto have more memory space by stopping the cam when the application is not used.

It will also give the preview of your gestures. The Start button must have been

pressed before starting any comparison.

The preview window: This window will just be used for the preview of your gesture.

You just have to click on the start button to have the preview.

The nearest database picture window: This small window is just made to indicate

which image is recognized in the data set. It is really useful when you have several


50/121


pictures of each gesture. Indeed, you will know which picture is recognized and it

will help you to understand how the recognition works.

The Text field: It is made to say the different indications to the user. So, it will be

really useful for the data set creation. It directly says which gesture you have to do,

how long you have to wait or even if you won or loosed.

The confirmation for recognition buttons: These buttons are especially made for test-

ing. After having launched a method and seen the results, you can click on yes or

no to say if your gesture is recognized or not. The aim is that the application will

automatically count how many gestures were found or not. It is really appreciable

for long test series.

The Players and computers position windows: These two windows are made to

see the picture analyzed for recognition. It shows the picture of the gesture you

just made and the random computers gesture. It is to have a quick overview of the

results and to make the game more attractive.

The Text field for the score: Here, you can see the score between the user and the

computer, and the user will also read the gesture recognized. When Scissors

against Rock is written, that means that the application recognized a scissors posi-

tion for the user, and that the computer gesture (random) is a Rock. In fact, the first

gesture written will be the position of the nearest data set picture recognized.

The Reset Score button: It is just made to reset the score in a simple way.

The Load Eigenface Matrix Button: When you launch the application, to avoid you

to wait to the loading of these huge matrixes in case you just want to use the gradient

histogram method, you can load these matrixes whenever you want. You must just

know that the Eigenface method will not work before you loaded these matrixes. It

is made to access to the GUI quicker.

The Game Database Creation button: It is made to create a new database. It will

take pictures of the user and calculate all the matrixes automatically. The aim of

this button is to realize a new set of pictures (database) for each user. The resultswill be better when the user makes its own set.

The Compare with Eigenface button: It will simply launch the script that analyzes

the new picture with the Eigenface method. You can not use this before you loaded

the Eigenface matrixes. Once the matrixes are loaded, the method goes really fast.

The Compare with Gradients button: It will just launch the gradient histogram method.

This method is a bit longer as the Eigenface one, but you do not have to wait for the

loading of matrix. You can directly use this method once the GUI is opened.


51/121


The Compare with simple sub button: This button will launch the first method im-

plemented: the simple subtraction one. This method is in the game just to show thatit is not good recognizing. You can just test it to have an overview of the results.

So, we have explained the different buttons of the application and what there are do-

ing. We have also seen the working space realized for this project. Now, it is important to

see which results we obtained for each method.

If you are interested by the script itself, then take a look on the Appendix C.

4.2 Test and choices of the parameters

In this section, we will take a closer look on the different tests made to explain our choices

in the different methods scripts.

During this project, we had to do multiple choices that have influenced the results. We

made tests to approve these choices, so that we are sure that the different way that we took

just go in favor of better results.

For both methods, we realized some diagrams to explain the results.

4.2.1 Choice of the size of the derivative filter and the number of box

for the gradients method

It is important to notice that all the pictures had the same size before doing any com-

parison. We just had pictures that made 640 480. We fixed the size of the imagesbecause it will influence on the results of the tests -the size of the derivative filter is

in pixels, so having a circle of 3 pixels on an image that is 50 30 will not have thesame effect as having the same circle on a 640 480 picture-. Thats why we fixed the

image size.

In this part, we will see how we choose the size of the derivative filter for the gradient

method. As well, we will also see how we choose the number of bins (or boxes) to count

the different orientation for the histogram. We can notice, that in all of our case, we

choose a circle derivative filter.

Before seeing the different graphics, we have to say that we realized these tests with

different positions. Indeed, we made these tests at the beginning of the project, and at


52/121


this time, the positions to recognize were the 5 positions of the hand between the 1 and

the 5. It is a good thing that the tests are made on these position, because in this case itwill give more information (the position are more precise and more difficult to recognize,

so the influence of the number of box or of the derivative filter size will also have more

impact).

We can first look to the graphics of the euclidian distance between gestures in the same

class (1 to 5) on the figure 4.4. It is in function of number of bins and derivative filter size.

Figure 4.4: Graphic of the euclidian distance between the 1 themselves This graphic

represent the draw of euclidian distance in y-axis and the number of box with the size of

the derivative filter in the x-axis -it goes from 18box filter 3, then 18box filter 6, 18 box

filter 12, 36box filter 3 ... to 72box filter 12-.

So, as we can see on this picture, the lowest euclidian distance is for 36 bins. Then if

we make a mean of the different distance intra-class, we can see that the best choice is to

choose a derivative filter of 6, to obtain a minimum euclidian distance.

But, to confirm that, we have to take a look on the figures 4.5 and 4.6, that shows the other

position distances between themselves.


53/121


(a) (b)

Figure 4.5: Graphics of the euclidian distance between the 2 and the 3 themselves

These two graphics represent the euclidian distance in y-axis and the number of box with

the size of the derivative filter in the x-axis as for the first graphic.

(a) (b)

Figure 4.6: Graphics of the euclidian distance between the 4 and the 5 themselves

Same graphics than the first one.

So with these graphics, we can definitely say that to have the lowest euclidian distance

in a class between position of the same class, we have to choose 36 bins. And that is what

we will choose for the application. Now, we will observe the graphics of the euclidian

distance between a class and another. We can watch the 1 against the other classes on

the figure 4.7.

Here, what is important to notice is not the highest euclidian distance, but the high


54/121


Figure 4.7: Graphic of the euclidian distance between the 1 class and the other

classes This graphic represent the draw of euclidian distance in y-axis and the otherclasses in x-axis. Each curve is for a number of box with a size of the derivative filter.

of the euclidian distance in a class (intra-class) in comparison with the euclidian distance

between the classes.

We will just watch 1 graphic, because all the graphics look the same. We decide to choose

the first one (class 1 against the other).

We can see that between the 1 position and the 2 position, we have a mean in euclidian

distance around 0.4 (All the distance are normalized, so the maximum distance that we

can have between two classes is 2 -when the two normalized vectors are opposite-).

We can see that between the 1 position (in intra-class), the euclidian distance is alsoaround 0.4.

Moreover, we can see that for the others positions, the distance between the different

classes (inter-class) and the intra-class distance are similar.

What does it mean?

It means that our set of pictures is too close. Our different images are for sure too close,

because the distance between the classes are the same as our distance in the classes.

Therefore, we changed the data set and we took other pictures that are more far between


55/121


them when we change the class. We founded that the positions Rock, Paper and Scissors

were corresponding to what we wanted, as well as it could do a good application.

In any way, we can still say that the best is 36 bins, because it allows in any case to

have better results. We can see that with 18 bins, the different histograms are too close,

because too many orientations are in the same bin, and that for 72 bins, the different ori-

entations are too spread. So these orientation histograms are not good. In one case, we

will have big peeks and in the other case, we will have a too regular histogram.

Now, to confirm that a circle derivative filter of 6 will be the best, we made other test

_DeBoisset_report.pdf

Documents

Transcript of _DeBoisset_report.pdf