_DeBoisset_report.pdf

download _DeBoisset_report.pdf

of 121

Transcript of _DeBoisset_report.pdf

  • 7/28/2019 _DeBoisset_report.pdf

    1/121

    HAND GESTURE RECOGNITION

    Gradient orientation histograms and

    eigenvectors methods

    Bertrand de BOISSET

    [email protected]

    FRAUNHOFER INSTITUT

    INSITITUT GRAPHISCHE DATENVERARBEITUNG

    Fraunhoferstrae 5

    D-64283 Darmstadt

    Supervisor:

    Didier Stricker

    Examiner:

    Didier Stricker

  • 7/28/2019 _DeBoisset_report.pdf

    2/121

    1

    Declaration

    I hereby declare that this dissertation and the work described in it is my own work, except

    where otherwise stated, done only with the indicated sources. All the parts, which were

    inferred from the sources, are marked as such. It has not been submitted before for any

    degree or examination, at any other university.

    DARMSTADT, June 15th 2006

    Ehrenwortliche Erkl arung

    Hiermit versichere ich, die vorliegende Diplomarbeit ohne Hilfe Dritter und nur mit den

    angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus den

    Quellen entnommen wurden, sind als solche kenntlich gemacht worden. Diese Arbeit hat

    in gleicher oder ahnlicher Form noch keiner Pr ufungsbeh orde vorgelegen.

    DARMSTADT, 15. Juny 2006

    Declaration

    Je declare que le rapport realise ainsi que le travail decrit dans ce document est un travail

    personnel, sauf contre-indication, realise avec laide des sources citees dans la bibliogra-phie. Toutes les parties qui sont reprises sont indiquees en tant que telles. Ce projet na

    jamais ete presentee pour aucune autre examination auparavant dans aucune autre univer-

    site.

    DARMSTADT, 15 Juin 2006

  • 7/28/2019 _DeBoisset_report.pdf

    3/121

    2

    Abstract

    The aim of this work is to implement different methods to make gesture recognitions. The

    main parts of my work were:

    First the analysis of the different ways to realize gesture recognition.

    Then to implement the Gradients histogram recognition. This method consists in

    calculating gradients in a picture and then construct histograms of gradients orien-

    tation.

    We also took a closer look on the algebraical analysis of an image, by searching theprincipal components that defines a set of pictures (eigenvectors in the space of the

    data set). This second method is called PCA (Principal Component Analysis).

    Then, to finish the project, we had to analyze the different methods implemented,

    by performing different tests. After that, We could define the best and worst points

    of each method. We also realized a small application to illustrate our work.

  • 7/28/2019 _DeBoisset_report.pdf

    4/121

    Acknowledgments

    I would like to thank my supervisor, Alain Pagani, for his enthusiasm, help and guidance

    throughout this project. I would also like to thank Didier Stricker, who supervised my

    work during this period. And, I will not forget:

    F. Merienne, C. Pere, M. Moll, H. Wuest,F. Vial... They all helped me to finish this

    project in time and gave me some pieces of advice when i needed.

    All the members of the Department for Virtual and Augmented Reality (A4) of the

    Fraunhofer IGD for providing an interesting and stimulating working environment.

    3

  • 7/28/2019 _DeBoisset_report.pdf

    5/121

    Contents

    1 Project Aims 6

    2 Theory and backgrounds 7

    2.1 The database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2 The simple subtraction method . . . . . . . . . . . . . . . . . . . . . . . 8

    2.3 The Gradient based method . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.3.1 The goal of this method . . . . . . . . . . . . . . . . . . . . . . 8

    2.3.2 Main steps of the method . . . . . . . . . . . . . . . . . . . . . . 9

    2.4 The Principal Component Analysis -PCA- method . . . . . . . . . . . . . 10

    2.4.1 The goal of this method . . . . . . . . . . . . . . . . . . . . . . 10

    2.4.2 Mathematical Backgrounds . . . . . . . . . . . . . . . . . . . . 10

    2.4.3 Main steps of the method . . . . . . . . . . . . . . . . . . . . . . 18

    3 Implementation and explanation 21

    3.1 simple subtraction method . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.1.1 Realization of the method . . . . . . . . . . . . . . . . . . . . . 21

    3.1.2 Conclusion on the method . . . . . . . . . . . . . . . . . . . . . 21

    3.2 Histograms of oriented gradients method . . . . . . . . . . . . . . . . . . 21

    3.2.1 Step 1: Gradient magnitude calculation . . . . . . . . . . . . . . 22

    3.2.2 Step 2: Gradient orientation calculation and magnitude threshold . 24

    3.2.3 Step 3: Gaussian filter operator . . . . . . . . . . . . . . . . . . 26

    3.2.4 Step 4: Euclidian distance comparison . . . . . . . . . . . . . . . 27

    3.2.5 Step 5: Establish a comparison matrix . . . . . . . . . . . . . . . 303.2.6 Problems encountered . . . . . . . . . . . . . . . . . . . . . . . 34

    3.2.7 Conclusion on the method . . . . . . . . . . . . . . . . . . . . . 39

    3.3 PCA or Eigenfaces method . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.3.1 Step 1: Realize the database . . . . . . . . . . . . . . . . . . . . 40

    3.3.2 Step 2: Subtract the mean . . . . . . . . . . . . . . . . . . . . . 42

    3.3.3 Step 3: Calculate the covariance matrix . . . . . . . . . . . . . . 42

    3.3.4 Step 4: Eigenvectors and eigenvalues calculation, and choose the

    good eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4

  • 7/28/2019 _DeBoisset_report.pdf

    6/121

    CONTENTS 5

    3.3.5 Step 5: Realize the new Data set and compare . . . . . . . . . . . 43

    3.3.6 Conclusion on this method . . . . . . . . . . . . . . . . . . . . . 44

    4 Tests, results and analyze 45

    4.1 The application: Rock Paper Scissors Game! . . . . . . . . . . . . . . . 45

    4.2 Test and choices of the parameters . . . . . . . . . . . . . . . . . . . . . 50

    4.2.1 Choice of the size of the derivative filter and the number of box

    for the gradients method . . . . . . . . . . . . . . . . . . . . . . 50

    4.2.2 Choice of the number of pictures and the size of images for the

    data set for both methods . . . . . . . . . . . . . . . . . . . . . . 55

    4.3 Last tests to explain the efficiency of each method . . . . . . . . . . . . . 60

    4.3.1 First tests: Recognition percentage of each method in general . . 614.3.2 Second tests: Recognition percentage of each method in different

    conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    5 Conclusion: advantages and drawbacks 74

    A Tables of general tests 78

    B Tables of specific tests 85

    C Script of the game 90

    List of figures 118

    Bibliography 120

  • 7/28/2019 _DeBoisset_report.pdf

    7/121

    Chapter 1

    Project Aims

    We can define the goal with a simple question: How could we command some different

    applications just by a singular hand gesture? The aim of my final year project is to an-

    swer this question by studying different methods that allow you to realize a hand gesture

    recognition. Moreover, the recognition has to be done by one camera and in real time, so

    that you can operate as fast as you want to.

    To begin, we had the idea to realize a simple subtraction between two images, pixel

    per pixel to compare them. We will see in the second Chapter the results of that.

    Then we studied a method that used Gradients. The aim is to build the orientationhistograms of the different pictures and to compare them. We will take a closer look

    on this method in the chapter 3.

    After that, we implemented a method called PCA (principal Component Analy-

    sis) or Eigenface. The goal is to calculate, find and study the eigenvectors of the

    different pictures and then to express each image with its principal components

    (eigenvectors). The difficult part was to find a way to compare the images through

    the expression of them with the eigenvectors (as it is done in the Eigenface -face

    recognition-)

    Last we created a small application to illustrate the different working methods.

    6

  • 7/28/2019 _DeBoisset_report.pdf

    8/121

    Chapter 2

    Theory and backgrounds

    Before explaining the theory of the different methods, we will just say the main idea of

    the methods.

    In fact, we realized a database of different hand gestures and we labeled all the data set

    pictures so that each picture is classified. Then, the aim is to compare an unknown image

    with an image of the database and identify the label by taking the nearest image label

    back.

    Therefore, we will see in the first part how we choose our database and how we defined

    it.

    2.1 The database

    In this section, we will take a closer look on the database.

    At the beginning, we had two main questions about the creation:

    Which hand gestures should we choose?

    How many pictures of each gesture should we take?

    With these questions we could answer by two different kind of database for the samegestures choose:

    Take lots of pictures of the different hand gestures to realize a huge database, so that

    the recognition will be better (and it is a way to reduce the limits of the different

    methods: we will have much more chances to find an image in the database that

    looks like the gesture to analyze ). The problem is that it will be longer to look for

    the pictures in the database during the comparison.

    7

  • 7/28/2019 _DeBoisset_report.pdf

    9/121

    CHAPTER 2. THEORY AND BACKGROUNDS 8

    Take few pictures of the different gestures to realize the database quickly. Then it

    will be easier to create the database for the user, and it will be quicker to look for apicture in the database (during the comparison). The problem is that the recognition

    will be harder if the gestures are similar.

    Therefore, during the project, we begun by taking 5 positions of the hand ( 1, 2, 3, 4

    and five fingers) and lots of pictures in the database. We had good results, but the calcu-

    lation time was huge. Therefore, we decided to change the database creation by taking a

    minimum of pictures of new hand gestures (3 positions that were really different: scissors,

    paper and rock).

    We will study later the results returned. We will see in details why we change the positions

    and the database. What is important now is to understand with which kind of database we

    realized this work.

    2.2 The simple subtraction method

    The aim of this method was to try a simple way to compare images and then to explain and

    justify why we had to implement other methods. We will not take a deeper look on this

    method here. The theory is really simple: subtract the different images pixel per pixel,

    and then compare the results and show the closest one.

    We will see the different tests in the last chapter (4) and a short sum up oh this method inthe next chapter about implementation (3).

    2.3 The Gradient based method

    In order to study the hand gesture recognition, we will study the theory of the gradient

    orientation histogram method. In this section, we will take a closer look to define the aim

    of this method and the main steps to implement it.

    2.3.1 The goal of this method

    First of all, the aim of this method is to recognize different hand gestures (without cap-

    tors). These hand gestures must be clearly identified in order to command any kind of

    applications.

    The theory of this method is to study the gradients in the image and to analyze it to realize

    an orientation histogram. Then the goal is to compare the histogram to return the label of

    the nearest image.

  • 7/28/2019 _DeBoisset_report.pdf

    10/121

    CHAPTER 2. THEORY AND BACKGROUNDS 9

    2.3.2 Main steps of the method

    In order to implement this method, we begun by reading some different articles on the

    subject, which are really interesting and useful to know the directions to go to: [25] and

    [24].

    We can then split up into its main parts the method:

    First of all, we had to implement the gradient magnitude calculation. The aim is

    to define where in the picture the biggest gradient magnitudes are. Then, it will be

    easy to apply a threshold on the gradients in order to keep the really interesting one

    and to cut all the background noise. To realize this part, the theory is to calculate

    the magnitude with the formula:

    magnitude =

    dx2 + dy2

    Therefore, we have to calculate the derivative of the image in x and y to have themagnitude. We will have to choose a size for the derivative filter (in any case, we

    will choose a circle derivative filter).

    Then, we implemented the gradient orientation calculation. The goal is to realize an

    histogram cut in 36 bins (each 10 degrees) or more (we will study the influence later

    in the chapter 4). To realize this histogram, we will have to calculate the gradient

    orientation defined by the formula:

    orientation = Arctan(dy/dx)

    Therefore, with this formula, it will be possible to know the orientation of the gra-

    dients in the image. We can see that for both magnitude and orientation we will

    need the derivative of the image.

    With this histogram, we can then have a vector of gradient orientations, which is

    defining the picture quite good. So, this second step is the part that will allow

    us to compare the images between them. It is a way to define the form with an

    appropriate precision.

    Also, we had to realize a Gaussian filter to blur the image and have an homoge-

    neous picture. It will permit to obtain better results in the gradient magnitude and

    orientation calculation. The goal of this filter is to erase the background defects. We

    can say that for this method, it is really important to have an uniform background

    to avoid noise. To make the background more uniform and to erase white pixels,

    we realized this filter. It will permit to obtain better results.

    We created a gradient magnitude threshold which had to erase the lower levels

    gradients in order to keep the really interesting ones. That will cut all the noise

  • 7/28/2019 _DeBoisset_report.pdf

    11/121

    CHAPTER 2. THEORY AND BACKGROUNDS 10

    and regularize the background. This part will be complementary with the gaussian

    filter. The gaussian filter will blur the big defects (but it will still be there), and thethreshold will cut the lowest magnitudes. Then the noise will be quite well cut.

    Then, the next step was to calculate the euclidian distance between the vectors of

    the different images analyzed. This part is made to compare the different pictures,

    by comparing the different histograms. This is the final step. With this, we are able

    to recognize the different gestures.

    To conclude, we can say that this method does not require special mathematical back-

    grounds. Therefore, once we understood the main way to realize it, we just had to imple-

    ment it (3).

    2.4 The Principal Component Analysis -PCA- method

    In this section, we will still study the hand gesture recognition, but we will need some

    mathematical background to understand what we made. This method is called: PCA or

    Eigenfaces.

    So, we will take a deeper look to understand the mathematical backgrounds, the aim of

    this method, and the principal parts for realization.

    2.4.1 The goal of this method

    The Principal Components Analysis (PCA)will also be used for our gesture recognition.

    It is a useful statistical technique that has found application in different fields (such as

    face recognition and image compression). This is also a common technique for finding

    patterns in data of high dimension too. Before realizing a description of this method, we

    will first introduce mathematical concepts that will be used in PCA. Here, we will speak

    about standard deviation, covariance, eigenvectors and eigenvalues. This background

    knowledge is made to make the PCA section easier to understand, but can be skipped

    if the concepts are already known. There are examples all the way through this kind of

    lesson to illustrate the concepts explained.

    2.4.2 Mathematical Backgrounds

    This section will attempt to give the elementary mathematical background that will be

    required to understand what is the Principal Components Analysis. We will try to realize

    a kind of sum up of the principal knowledge used in the PCA method. Each parts is inde-

    pendent from the other. We can notice that the goal of that is to understand the principal

  • 7/28/2019 _DeBoisset_report.pdf

    12/121

    CHAPTER 2. THEORY AND BACKGROUNDS 11

    lines of the method and especially to understand why this method is used and what signify

    the results returned. We will not use all the backgrounds knowledge described here, butthe different section will provide the grounding of the main skills required.

    Therefore, we will first take a quick look on Statistics, and especially on the spread of

    data and on the distribution measurements. Then, the other section is on Matrix Algebra

    and looks at eigenvectors and eigenvalues (important properties of matrices that are more

    than fundamental to PCA).

    Statistics

    What we will see about statistics is how to analyze a big set of data and how to find and

    understand the relationship that we have between the elements of the data set. In this

    section, we will take a look on the measurements we can perform on a data set and what

    they tell us about the data.

    Standard deviation First of all, we will see closer what is the Standard deviation. In

    statistics, we generally use samples of population to realize the measurements. The results

    returned on this sample will permit to have an overview of the possible and most likely

    results that we could have if we make the same test on the entire population. Therefore,

    we just extend the sample results to the entire population. To explain it clearly, we will

    create a data set and assume that it is just a sample of a larger data set (it is not used in

    our project, but it will help us to understand easily the concept).

    Here is an example set:

    X = [1 2 4 6 12 15 25 45 68 67 65 98]

    For the notation, we will use the symbol X to refer to the entire sample and we will

    use the symbol Xn to indicate a specific data of the sample. Therefore, X3 refers to the3rd number in X (we can notice that X1 will be the first data and not X0). Therefore, withthis kind of samples, we can realize many calculations that will give us information about

    the set. For example, we can first calculate the mean of the set. As it is really simple, we

    will just give the formula of it but we will not describe further.

    X =

    ni=1 Xi

    n

    It is important to note that we will call X the mean of the set X. The mean of the dataset will not give us so many indications, apart of the middle point.

    For example, we can have the same mean for two really different data sets. Therefore, we

    will see what is important to better define the data sets below:

    [0 8 12 20] and [8 9 11 12]

  • 7/28/2019 _DeBoisset_report.pdf

    13/121

    CHAPTER 2. THEORY AND BACKGROUNDS 12

    Here, what is really different between the two sets is the standard deviation. This

    is a way to measure the spread out of the data in a set. Here is the definition of thestandard deviation: This is the square of the add of the average distance from the mean of

    the set to the point, divided by n 1, when n is the number of points in the set. Here isthe formula:

    s =

    ni=1(Xi X)2(n 1)

    Where s is the usual symbol for standard deviation of a sample.

    We can wonder why we are dividing the sample by n 1 and not by n. We will not giveany explanations of that here, because it would be too long to explain, and it is not impor-

    tant for our project. But what is important to remember is that when we use a sample of

    a population and that we want an approximation results for the entire population, then we

    will have to use n 1. But if we calculate the standard deviation on the entire populationdirectly, then we will have to use n instead ofn 1. We can find further information onthe web site http://mathcentral.uregina.ca/RR/database/RR.09.95/weston2.html

    This page is explaining a bit more about standard deviation and about the differences

    between the denominators choice. It also gives interesting experiments which are well

    describing the difference between the samples or population used and therefore on the

    denominators choose.

    We will draw tables of the standard deviation calculation for the 2 sets written upper.

    Set 1:

    X (X X) (X X)2

    0 -10 100

    8 -2 4

    12 2 4

    20 10 100

    Total 208Divided by (n1) 69,333Square root 8,3266

    Set 2:

  • 7/28/2019 _DeBoisset_report.pdf

    14/121

    CHAPTER 2. THEORY AND BACKGROUNDS 13

    Xi (Xi X) (Xi X)2

    8 -2 49 -1 1

    11 1 1

    12 2 4

    Total 10

    Divided by (n1) 3,333Square root 1,8257

    As expected, the first set has a much bigger standard deviation as the second one. In-deed, the first data set has really spread out data instead of the second one.

    We can just watch quickly another set, which will have a standard deviation of zero:

    [10 10 10 10]

    Here, the standard deviation is equal to zero, although the mean is still of 10. This is

    because all the points are the same so the data are not spread out. None of them deviate

    from the mean.

    Variance Variance is another measure of the spread out of data in a set. In fact it is

    quite the same as the standard deviation.

    We can take a look on the formula:

    s2 =

    ni=1(Xi X)

    2

    (n 1)

    We can notice that this is just the square of the standard deviation (thats why the sym-

    bol s2 is used).Usually, we use the symbol s2 for the variance of a sample. The variance is just another

    way of measuring the spread out of data in a sample. We can say that the variance is lessused as the standard deviation. In fact, the variance will be useful for the next section

    which is the covariance.

    Covariance The covariance will differ from the two first measurements explained in

    the upper sections on one principle way: the covariance is a 2-dimensional measurement.

    The covariance is a really important knowledge for the PCA method, because we will

  • 7/28/2019 _DeBoisset_report.pdf

    15/121

    CHAPTER 2. THEORY AND BACKGROUNDS 14

    need this calculation later.

    So, the calculation of standard deviation or of variance will be useful in the case of onedimension data set, like the set of the marks obtained by all the ENSAM students for their

    FYP (Final Year Project). But, for the PCA method, which will deal with more dimen-

    sions, we will need the covariance and not the variance knowledge.

    The covariance will allow us to see if there are any relationship between the different

    dimensions of the data set. For example, we could realize a 2-dimensional set of the

    marks obtained by the ENSAM students and their age. Then, we could see if the age has

    an effect on the mark received by the student. It is exactly the kind of test that we could

    perform with the covariance (We can yet imagine where we want to go with that in our

    project: watch if our different pictures are in relations or not).The covariance formula is really near from the variance formula. We can write the vari-

    ance formula like this, to better understand the covariance formula:

    var(X) =

    ni=1(Xi X)(Xi X)

    (n 1)

    Now we can take a look on the covariance formula:

    cov(X, Y) =

    ni=1(Xi X)(Yi Y)

    (n 1)

    We can just notice that if we try to calculate the covariance between a dimension anditself, we will get the variance.

    In fact, we just replace the second part of the formula with the second dimension to ana-

    lyze to obtain the covariance formula!

    We can also say that it is possible to calculate the covariance between more dimensions

    than two. We can calculate covariance on three dimensions for example. The lonely thing

    to know is that we will calculate 9 covariances between dimensions (2 by 2) and then

    create a matrix (called the covariance matrix), that will be 3 3, in case of three dimen-sions. In fact, the diagonal will be the result of the variance for each dimension and the

    other terms will be the covariance between terms (for example, line 2 column 1 will be

    the covariance between the y and the x dimensions). By the way, we can notice that thecovariance is commutative (we can easily replace each dimension per the other without

    changing the results). Therefore the covariance matrix will be symmetrical.

    Then, we can get lots of really important information with the covariance calculation.

    In any case, it is important to notice that the value returned will not be as important as the

    sign returned.

    Indeed, if the result is positive, that will mean that the two dimensions increase together

  • 7/28/2019 _DeBoisset_report.pdf

    16/121

    CHAPTER 2. THEORY AND BACKGROUNDS 15

    (For our example on the ENSAM students -marks received and age- ) this will mean that

    the mark increases when the age increases.And if the result is negative, then it will mean that when one dimension is increasing, then

    the other is decreasing.

    Last case, the result returned is null. That will just mean that our 2 dimensions do not

    have any kind of relations between them. They are independent.

    Therefore, the covariance calculation can bring us really important indications on the

    set of data we are studying. With it, we can then represent the covariance between 2 di-

    mensions in a graph to get an idea of the relation that exists between them.

    Of course, it will not be possible to represent the covariance when our data set will have

    more than 3 dimensions.Although the covariance can just be calculated between two dimensions and it is not

    possible to represent the relationship between the data when we get more than 3 dimen-

    sions, the covariance is often used for big data set with many dimensions. Indeed, we

    can calculate the relationship between the dimensions and have some exploitable results.

    Moreover, it will be pretty hard to visualize the relationship between dimensions when

    we have a huge data set with many dimensions without the calculation of the covariance.

    Therefore, the calculation of the covariance will bring us lots of help to see the relation-

    ships between dimensions in a data set like we have in our project.

    The covariance matrix Recall that covariance is always measured between 2 dimen-

    sions. If we have a data set with more than 2 dimensions, there is more than one covari-

    ance measurement that can be calculated. For example, from a 3 dimensional data set

    (dimensions x, y ,z ) we could calculate cov(x, z), cov(y, z)...In fact, for an n-dimensional data set, we can calculate n!

    2(n2)!different covariance values.

    The other will be the variance in the diagonal.

    A useful way to get all the possible covariance values between all the different dimen-

    sions is to calculate them all and put them in a matrix. Lets have a quick overview of the

    definition for the covariance matrix for a set of data with n dimensions:

    Cnxn = (ci,j, ci,j = cov(Dimi,Dimj))

    Where Cnxn is a n by n matrix (n rows and n columns), and Dimx is the xth dimension.

    We can here notice that the covariance matrix will be square in any case, and that each

  • 7/28/2019 _DeBoisset_report.pdf

    17/121

    CHAPTER 2. THEORY AND BACKGROUNDS 16

    part of the matrix is the result for a covariance calculation between two dimensions (ex-

    cept for the diagonal as said before).

    For example, we will build the covariance matrix for a 3 dimensional data set, using the

    usual dimensions x , y and z. Then, as the matrix is square we will have the values below:

    C =

    cov(x, x) cov(x, y) cov(x, z)cov(y, x) cov(y, y) cov(y, z)

    cov(z, x) cov(z, y) cov(z, z)

    As said earlier, the matrix will be symmetrical, and the diagonal will be the variance

    calculation. Therefore, we can say that the matrix will have this form:

    C =

    var(x) cov(x, y) cov(x, z)cov(x, y) var(y) cov(y, z)

    cov(x, z) cov(y, z) var(z)

    Therefore, we will have 6 terms to calculate for the 9 terms.

    Matrix Algebra

    This section is made to provide a background for the matrix algebra required in PCA. We

    will especially take a closer look at the eigenvectors and eigenvalues of a given matrix.

    Lets see an example of matrix:

    2 32 1

    32

    = 4

    32

    For example, 4 is an eigenvalue of the matrix.

    Eigenvectors First of all we will give the wikipedia definition of an eigenvector:In linear algebra, the eigenvectors (from the German eigen meaning inherent, character-

    istic) of a linear operator are non-zero vectors which, when operated on by the operator,

    result in a scalar multiple of themselves. The scalar is then called the eigenvalue associ-

    ated with the eigenvector.

    As we can see in the example upper, the results of the multiplication between a vector

    and a matrix returns exactly 4 times the beginning vector. We have here an example of

  • 7/28/2019 _DeBoisset_report.pdf

    18/121

    CHAPTER 2. THEORY AND BACKGROUNDS 17

    eigenvector. We will try to explain this example to better understand the eigenvectors.

    The vector is a 2-dimensional one. The vector

    32

    represents an arrow going from the

    origin (0, 0) to the point (3, 2). The matrix

    2 32 1

    can be imagined as a transformation

    matrix. Therefore, if we multiply this matrix with a vector, the result returned will be

    another transformed vector. If this transformed vector is just a multiplication of itself

    by a scalar, then it is an eigenvector and the scalar will be the eigenvalue associated to the

    eigenvector.

    Now we will try to see the different properties of these eigenvectors:

    First of all, we can just find eigenvectors for square matrixes. We can also say that

    not every square matrixes do have eigenvectors. In the case they have, then they can

    not have more eigenvectors than their dimension (for a 3 3 matrix, the maximumnumber of eigenvectors is 3).

    You can multiply an eigenvectors by a scalar, it will still be an eigenvector (because

    we just change the length and not its direction).

    All the eigenvectors are orthogonal between them, no matter the number of dimen-

    sions.

    Most of the time, the returned eigenvectors are normalized (norm = 1). It will bethen easier to exploit.

    We can find further information on eigenvectors on the web site:

    http://www.mathphysics.com/calc/eigen.html .

    Eigenvalues Each eigenvector is associated to an eigenvalue. The eigenvalue could giveus some information about the importance of the eigenvector. The eigenvalue are really

    important in the PCA method, because they will permit to realize some threshold to filter

    the non-significative eigenvectors, so that we can keep just the principal ones.

    MATLAB will return the eigenvalues and the eigenvectors of the covariance matrix with-

    out any problem.

  • 7/28/2019 _DeBoisset_report.pdf

    19/121

    CHAPTER 2. THEORY AND BACKGROUNDS 18

    2.4.3 Main steps of the method

    Finally we arrived to Principal Components Analysis (PCA), the interesting part of our

    project. We could first answer a question: What is it exactly?. We can answer that it

    is an algebraical way to compare images by compressing the set of data and highlighting

    the principal components of the set.

    The main advantage of PCA is that once we have found the principal components of

    the set, which express pretty well the data, we can take back the beginning data (images

    in our case) with a low loss, even if the compression is really high!

    In this section, we will try to explain how we went through the problems to realize this

    method to make gesture recognition. Therefore, we will describe the work made step by

    step, to understand each part of the work.

    We can then split up the method into its main parts:

    First of all, we had to create the data set. Indeed, we had to take some pictures of

    the hand that could do the database for the PCA recognition. The aim is to choose

    a good number of pictures and a good resolution of these in order to have the best

    recognition with the smallest database. Then, the aim is to make the database. To

    create it, the theory is to transform all the pictures in a simple vector, which willhave a dimension of the number of pixels. Then, we create a matrix where each line

    is an image-vector... The results for 12 pictures and a 640 480 definition will be a12 307200 matrix.

    Then, the next step is to subtract the mean from each of the data dimensions. The

    mean subtracted is simply the average across each dimension.For example, for a

    three dimensions x, y and z, we will have to subtract x from x, y from y and zfromz. The aim is to center our set in the space of all the dimensions (we will see laterfurther explanations of the different spaces used, but what is important to remember

    here is that we have to subtract the mean to center our set of data).

    The step three is to calculate the covariance matrix of the database. It will be quite

    difficult in our case, because the data set is really huge! So we have found a method

    to simplify this calculation. We will explain the method:

    Indeed, we can not calculate the covariance matrix of the first matrix , because it

    will be too huge. So we had to find a way to find out the principal eigenvectors

    without calculating the big covariance matrix.

    I have found the solution in a paper written by M. Turk and A. Pentland. [ 23]

  • 7/28/2019 _DeBoisset_report.pdf

    20/121

    CHAPTER 2. THEORY AND BACKGROUNDS 19

    The method consists in choosing a new covariance matrix.Indeed we will call our second matrix (all the images with the mean subtracted)

    12 307200 :A. Our training set of image will be B1, B2, B3...B12 with dimen-sions l c. M is the average of the whole set of pictures. As seen earlier, wetransform each image in a vector of l c dimensions. So, we can say that our pic-ture is a point in a l c dimensional space. Therefore, our 12 images represent 12points in this space. But, as we centered the set (by subtracting the mean), each

    picture is not so far from the other in this space (because they are quite similar at

    the end). Therefore, it is possible to express our data set with less dimensions.

    Our covariance matrix for A will be called C and C is defined by:

    C = A AT

    Then, the eigenvectors and the eigenvalues of C will be the principal componentsof our data set. But as explained before, we can not calculate C.The idea is to say that when we have 12 points in a huge space, the meaningful

    eigenvectors will be less than the dimension, and the number of the meaningful

    ones will be the number of points minus 1. So in our case, we can say that we will

    have 11 meaningful eigenvectors. The remaining eigenvectors will have an eigen-

    value around zero.

    Fortunately, it will be easier to calculate the eigenvectors of a 12 12 matrix thanfor a 307200 307200 matrix!We will name the eigenvectors of the covariance matrix AT A, vi and its eigenval-ues ki.We can then write:

    AT A vi = ki vi

    Then we multiply both side per A:

    A AT

    A vi = A ki vi = ki A vi

    We can see that A vi are the eigenvectors ofC = A AT. Now , we will construct

    a new matrix L = AT A, and we will find the l eigenvectors vl ofL.These vectors determine linear combination of the 12 training set images to form

    the eigenpictures of our set.

    So, with this subtlety, we will have a small covariance matrix to calculate : 12 12instead of307200 307200! The calculation will also be much faster and the eigen-

  • 7/28/2019 _DeBoisset_report.pdf

    21/121

    CHAPTER 2. THEORY AND BACKGROUNDS 20

    vectors returned are the principal one.

    Then, we will calculate the eigenvectors and the eigenvalues of the covariance ma-

    trix. This will give us the principal orientation of the data. MATLAB will do it

    easily.

    After that, we have to choose the good components and form the feature vector.

    This is the principal step. We will have to choose the principal (most important)

    eigenvectors with which we can express our data with the lowest information loss.

    We also have to choose a precise number of eigenvectors to have the less calculation

    time, but the best recognition. Here, the theory says that we will normally have 11

    meaningful eigenvectors.

    Last, the final step is to make a new data set (that we will call eigenset). Then, it

    will be possible to realize the last script which could compare the different pictures

    and class them by resemblance order. To compare the different pictures, we will

    have to express each image of the data set with these principal eigenvectors. The

    last thing to do is to compare (by calculating the euclidian distance between the

    coefficients that are before each eigenvector).

    To conclude, we can say that we will need more mathematical backgrounds for this

    method. Then, once the theory is well understood, we can implement this method on

    MATLAB too. In the next chapter, we will take a closer look on the implementation of

    the different methods.

  • 7/28/2019 _DeBoisset_report.pdf

    22/121

    Chapter 3

    Implementation and explanation

    3.1 simple subtraction method

    3.1.1 Realization of the method

    We will not take too many times to explain how we realized this comparison, because it

    is really simple to create and the results are really bad. Therefore, what is important to

    notice here is:

    Before doing the subtraction, we applied some adjustments on the contrast first, and

    then we applied a blurring filter to erase the background imperfections.

    We performed some tests on this method that we can see in the last chapter 4). This

    figure shows the efficiency of this method : 4.14.

    This method confirms the idea that we should implement other methods, because

    the results returned are really bad and we can say that it does not work properly.

    3.1.2 Conclusion on the method

    In any case, it is a good thing to know what are the results for this method. It was our

    first idea and it has confirmed us that we had to look further in the image analysis, by

    implementing other methods.

    3.2 Histograms of oriented gradients method

    In this section, we will explain how we implemented this method, and the problems en-

    countered. We will try to understand each part of the method and why it works or not

    21

  • 7/28/2019 _DeBoisset_report.pdf

    23/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 22

    works in the different cases.

    3.2.1 Step 1: Gradient magnitude calculation

    After a first approach of the MATLAB software, we realized the first script which had to

    calculate the magnitude of each gradient in the image.

    Magnitude gradient definition:

    If dx and dy are the outputs of the x and y derivative operators, then the gradient

    magnitude is calculated by:mg =

    dx2 + dy2 (3.1)

    Therefore, in order to calculate the gradient magnitude, we had first to calculate the

    derivative dx and dy of the image.

    The script of the derivative operator has been found on internet, but we can see what

    it looks like to understand the following steps:

    X-derivative operator script:

    explanation of how to use the script:

    function d = xDeriv(im, xRadius, yRadius, shape)

    XDERIV Returns the X-derivative of image im.

    D = XDERIV(IM, XRADIUS, YRADIUS, SHAPE)

    IM - Input image.XRADIUS - half the width of the vicinity in which the

    derivative is calculated.

    YRADIUS - half the height of the vicinity in which the

    derivative is calculated (default: equal to XRADIUS).

    SHAPE - Either of:

    full - (default) returns the full 2-D convolution,

    same - returns the central part of the convolution

    that is the same size as A (the default).

  • 7/28/2019 _DeBoisset_report.pdf

    24/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 23

    valid - returns only those parts of the convolution

    that are computed without the zero-paddededges, size(C) = [ma-mb+1,na-nb+1] when

    size(A) size(B).

    Written by Ariel Tankus, 19.9.96.

    Therefore, this script can calculate the derivative matrix by entering the image refer-

    ence, the xradius, the yradius and the returned shape you want.

    So, with the same script for the y derivative, we could have the gradient magnitude reallyfast. We just had to know about the speed of the running process.

    We also implemented the gradient magnitude script as described below:

    Gradient magnitude operator script:

    function(mag,dx,dy) = grad(im, xRadius, yRadius, shape)

    GRAD Return the gradient magnitude of the given image.

    (MAG, DX, DY) = GRAD(IM, XRADIUS, YRADIUS, SHAPE)

    IM - image

    XRADIUS - half width of derivation vicinity.

    YRADIUS - half height of derivation vicinity.

    SHAPE - either of same, valid, full. See xderiv.

    MAG - Gradient magnitude.

    DX - X-derivative (optional).DY - Y-derivative (optional).

    The outputs are for MAG:

    M AG =

    min(min(

    dx2 + dy2)),max(max(

    dx2 + dy2))

    (3.2)

  • 7/28/2019 _DeBoisset_report.pdf

    25/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 24

    The MAG output returns a two-dimensional vector which has the minimal and the

    maximal term of the gradient magnitude matrix. The aim is to relieve the calculation witha smaller output.

    The DX and DY outputs return two matrixes, which have both the size of the image ma-

    trix.

    What we just needed as the outputs is the minimum and the maximum of the gradient

    magnitude in order to realize an efficient threshold (to cut the lowest gradient magni-

    tudes). But we will come back on this part later (3.4).

    Then, having the minimum and the maximum gradient magnitude of the picture, we

    could go through the second part: The gradient orientation calculation.

    3.2.2 Step 2: Gradient orientation calculation and magnitude thresh-

    old

    Once we have made the gradient magnitude calculation, we implemented the second script

    which had to calculate the orientation of each gradient (which is enough important) in the

    image.

    Gradient orientation definition:

    If dx and dy are the outputs of the x and y derivative operators, then the gradientdirection is calculated by:

    dir = arctan (dy/dx) (3.3)

    Therefore, in order to calculate the gradient direction, we had just to use the derivative dxand dy of the image, that we already calculated for the magnitude.

    Then, after having applied a threshold on the gradient magnitude, we had to sort out all the

    measurement in a 36 dimensions vector. We made 36 bins (10 degrees each). And after

    having the vector, we plot it in polar and cartesian coordinates, just to have an overview

    of the orientation.

    We will describe the script wrote to realize this implementation:

  • 7/28/2019 _DeBoisset_report.pdf

    26/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 25

    Gradient orientation operator script:

    function(Z) = grador2(im, xRadius, yRadius, shape);

    This function returns the 36 dimensional gradient orientation vector of the given im-

    age.

    (ORI, DX, DY) = GRADORIENTATION(IM, XRADIUS, YRADIUS, SHAPE)

    IM - image

    XRADIUS - half width of derivation vicinity.

    YRADIUS - half height of derivation vicinity.SHAPE - either of same, valid, full. See xderiv.

    ORI - Gradient orientation matrix (contains all the gradient directions).

    (gm,dx,dy)=grad(im, xRadius, yRadius, shape);

    We call the gradient magnitude operator. It will return the X-derivative, the Y-derivative

    and the 2 dimensional magnitude vector (to have the maximum and the minimum of all

    the gradients).

    gm = ((gm(1) + gm(2)) 0.1) + gm(1) (3.4)

    This is the threshold number. It is relative to the image and fixed at 10 percent of the

    scale (between the minimum and the maximum gradient magnitude).

    Then, we defined that when we have less inputs than 3, we will consider the shape as

    same and the yradius equal to theyradius.

    The next step was to create a 36 dimensional vector which was full of zeros. Then,

    we just had to increment each bin when an orientation is found.We had to take care that

    the arctan function is just available from 0 to Pi.

    After that, we had to create the gradient direction matrix, with all the pixels gradient

    directions. We had to take in consideration the threshold to have the main gradient mag-

    nitude orientation.

    We will see in the section 3.2.6 that we have some problems with the borders of the

    pictures. Therefore, we will have to take in consideration this problem and we will cut

    the new image border. It is relative to the input and will avoid the high level gradient

    magnitude which are calculated on the different borders of the pictures.

  • 7/28/2019 _DeBoisset_report.pdf

    27/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 26

    So after having realized the gradient orientation vector of each image, it will be possi-ble to realize comparison between the different images.

    Then, to see the histograms, we will have to display it by showing the cartesian and

    the polar representation of the orientation vector.It is good to have the two histograms to

    see accurately where the orientation peeks are.

    So, we have seen how to calculate the gradient orientation vector of a picture. With

    the different gradient orientation vectors of the data set, it will be possible to compare the

    images between them and so to sort them out.

    We can take a look on the histograms form on the figure 3.1.

    Figure 3.1: Representation of the orientation histograms for each new position Onthis graph, each histogram is drawn under its picture.

    3.2.3 Step 3: Gaussian filter operator

    After having lots of problems in our orientation vectors (too much noise), described in the

    paragraph 3.2.6, we decided to realize and apply a gaussian filter on the picture. We will

    now detail the way we realized a gaussian filter on MATLAB, in order to blur the image

  • 7/28/2019 _DeBoisset_report.pdf

    28/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 27

    and to erase unwanted high level contrasts. We can notice that the gaussian filter function

    is also in the Image processing toolbox. Therefore, it was useful at the beginning of theproject (we did not have this toolbox), but then we used the direct MATLAB function.

    Gaussian filter operator script:

    This script will just return the filtered image of the given one.

    The aim of this script is to balance each pixel in function of the other. Therefore, we just

    have to put a weight on each pixel around the one to be blurred. After that, a white pixel

    on a black background will become dark gray. We decided to make a circle filter of three

    pixels around. We choose to balance with this numbers below for the different weight, asseen on he table 3.2.

    Figure 3.2:

    In order to have a good filter, we can change the value of the filters. But, with these

    values, the picture will be well blurred and it will erase a large part of the background

    noise without having a too strong filter.

    3.2.4 Step 4: Euclidian distance comparison

    In this section, we will see how we realized the comparison between images, in order to

    have a realistic and efficient gesture recognition.

    We have used the euclidian distance on the gradient orientation vector (36 dimensional

    vectors) to calculate the difference between two pictures.

    The euclidian distance comparison:

  • 7/28/2019 _DeBoisset_report.pdf

    29/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 28

    In order to obtain an efficient comparison between images, we decided to calculate

    the euclidian norm between the orientation vectors of the picture that has to be recog-

    nized and the data set pictures (We just calculated the euclidian distance of the gradient

    orientation vector between the analyzed picture and each database picture. We then sorted

    it out and selected the four smallest one). Therefore, we implemented a script that had to

    do this calculation between all the gradient orientation vectors.

    Here is the script we made:

    The vector-vector comparison script:

    function(disteuclid)=fini2(b, Im2, xRadius, yRadius, shape);

    Returns the 4 nearest database pictures of the analyzed image.

    First we created a 25 dimensional euclidian distance vector, where each dimension is

    a result of the comparison with a database picture.

    Then, we had to use the gradient orientation script to calculate the the orientation vector

    of the picture to analyze.

    After that, we returned the index of the 4 smallest vector terms in the 25 dimensional

    euclidian distance vector. With these terms it will be easier to find the images correspond-

    ing to these indexes and then to take back the label of the image and to display it.

    The script did work really well, but the executing time was really to high. It needed

    180 seconds to give the 4 nearest pictures. So, the new goal was to considerably reduce

    this time in order to have a quick answer to a done picture. We must not forget that the

    final aim is to have a real time application!

    The idea was to realize a script that could create a matrix (MATLAB works faster withmatrix) of all the gradient orientation database vectors and to save it in a text file. Then,

    it would be easier to compare the gradient orientation vector of the analyzed picture with

    each line of the database matrix, which we will call MATDIST (see C). That would cut

    all the calculation time for all the database pictures. Indeed, We did not need to calculate

    each time all the database pictures.

    We can now see how we made the code to create the MATDIST:

  • 7/28/2019 _DeBoisset_report.pdf

    30/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 29

    The database matrix operator:

    function(matdist) = matdist(Im2, xRadius, yRadius, shape)

    This function returns matrix of all the gradient orientation vectors of the pictures

    stored in the database.

    We decided to use the tic underlineMATLAB function to launch a chronometer. Thegoal is to know the calculation time for the matrix creation (we took 26 images for the

    data set). It is a good function to know the efficiency of the algorithm we made. More-

    over, we can then easily know the time we won with the different changes we made.

    We just had to add a matrix creation in the loop (which will be 26 lines for the 26 images

    and 36 columns for the 36 orientation bins). This new matrix will just be the combina-

    tion of all the orientation vectors. Each line will be the gradient orientation vector of a

    database picture.

    Once the matrix is created, we just had to save it in a specified folder so that we can load

    it whenever we want.

    The time to execute this script to create the database matrix is around 70 seconds and

    you just need to run it one time. With this script, we have won the time we wanted. Now,to operate and recognize a picture, we will run another script that compares the analyzed

    image with this matrix.

    We can now explain and comment the new script encoded:

    The vector-matrix comparison operator:

    function(disteuclid) = fini3(b, xRadius, yRadius, shape)

    This script returned a window displaying the 4 nearest images with the euclidian distanceassociated between each image and the compared one.

    First of all, we began by loading the MATDIST.

    Then we just had to calculate the orientation vector of the image to analyze and to com-

    pare it with each line of the database matrix.

    After that, we had to sort the distance out and to take back the database pictures with their

    label: the class is recognized.

  • 7/28/2019 _DeBoisset_report.pdf

    31/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 30

    With this script, the time needed to compare the images between them is about 3

    seconds. We won around 177 seconds in the execution times (We needed 180 seconds

    to make a comparison before)! To immediately test our results, we just had to take a

    database picture and to compare it with the database. If everything works, then it should

    return the same image in first with an euclidian distance equal to zero. This is the result

    we get in any case when we use a database picture. It verifies that in a case of two identi-

    cal pictures, the script returns a logical result.

    We can see it on the picture 3.3.

    On the picture 3.3, we can see that the image returned as the nearest is the image en-

    tered in input. This is a way to check the algorithm. Here, the algorithm is well working.

    The euclidian distance gives true results. Now, we have to check that the method is good

    in recognizing the picture with similar picture in the database.

    Now we can have a look on the 4 nearest pictures returned for the 1 position on the

    figure 3.4.

    On the figure 3.4, we can see that the two first pictures returned are the same gesture.

    But the third one is not the same gesture. Therefore, we had to try with others hand ges-

    tures and see the results. We will further test the algorithm in the chapter 4.

    3.2.5 Step 5: Establish a comparison matrix

    In this part, we will see further than just realizing the script. We will try to know why

    sometime it is not working as expected and what kind of solutions we could bring to havebetter results.

    As we have seen upper on the figure: 3.4, the results are not always as good as expected.

    For example, if we ask the result for another hand gesture recognition that is more com-

    plicated, we can see the returned answer on the figure 3.5.

    We can see that the results expected are clearly not the results given. This problem

  • 7/28/2019 _DeBoisset_report.pdf

    32/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 31

    Figure 3.3: Results of the 1st answer of vector-matrix comparison script for a

    database picture. We can check that the the first image has a euclidian distance equal to

    zero.

    comes from the database quality and size, or from the different positions we took. Wehave 26 pictures in the database and we took all of them different to see the problems that

    we could have...

    We also change the orientation of the hand and the fingers spacing during the hand shoot-

    ing, to recognize more positions.

    What we can notice is that we have much better results for the position 1. It is just

    because the spacing is not influencing the results and we just have the orientation of the

    finger that is really acting upon the results. Thats why we have better results with the

  • 7/28/2019 _DeBoisset_report.pdf

    33/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 32

    Figure 3.4: Results of the vector-matrix comparison script for a database picture(1).

    position 1 picture.

    Therefore, a solution would be to widen the database. We can take lots of pictures for

    each hand position and then apply the script again. The new problem expected is the run-

    ning process time that will be too excessive.

    The second solution is to change our gesture position and to choose new one, that are

    really different. We will try this possibility after.

    To define the way to go to, we tried to identify the problem clearly. Therefore, we decided

    to realize a matrix which could show in a gray scale if the different database pictures are

    close or not (all the pictures were taken under the same lighting) in terms of euclidiandistance between their gradient orientation vector. Black will mean that the pictures are

    really close and white that they are really different.

    To realize this matrix, we just had to use the database matrix already done and to com-

    pare each line with the others one. Then, MATLAB will display the new matrix in a gray

    scale to show the results.

    We can see the image of this matrix on the figure 3.6.

  • 7/28/2019 _DeBoisset_report.pdf

    34/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 33

    Figure 3.5: Results of the vector-matrix comparison script for a database pic-

    ture(position:5).

    We can see the database for this gray matrix on the picture 3.7.

    The best is to have just white everywhere, but black in the diagonal. Indeed, the white

    means that our gesture are really far between them and the black means it is the same

    picture. Therefore, we will have black in the diagonal in any case, because it is the com-

    parison between two identical pictures.

    To verify that our gestures are good between class and in a class, we can plot this gray

    matrix and when we have black in the same class and white between the class, that means

    that our gestures are perfectly choose.

    Here, we can see that our gestures are too close. There is for sure too many dark gray

    in the matrix. This shows us that the real problem is our gesture positions. Indeed, the

    positions are too close and then the recognition will be too hard to realize.

    Therefore, we decided to reduce the number of positions and to take just three really

    different positions: Rock, paper and Scissors. It will also allow us to realize an ap-

    plication (the well-known little game) to obtain a concrete comparison application.

  • 7/28/2019 _DeBoisset_report.pdf

    35/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 34

    Figure 3.6: Returns of the graymat function for a 5 pictures database (1 image per

    class).We can see that the diagonal is black: that shows that the matrix is well calcu-

    lated.On the diagonal is the euclidian distance between the two same vectors...

    We can see the gray matrix of our new gestures on the figure 3.8.

    With the observation of this new gray matrix, we can confirm that the new gestures

    choose are much better than the others. We have white between the different gestures (that

    means that the positions are far between them) and black in the diagonal, as expected. We

    can see in the results (Chapter 4) that the recognition will also work far better.

    3.2.6 Problems encountered

    During the realization of the different steps, we came across different problems that we

    will explain in this part. We will not say all the problems we had, but the one were we

    lost some time.

  • 7/28/2019 _DeBoisset_report.pdf

    36/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 35

    Figure 3.7:

    First problem:

    The first problem we had, was about the different class in MATLAB. Indeed, MATLAB is

    auto defining its term classes and all the function used depend of the class of each element

    called in the process.

    When we charged the image (MATLAB imread function), the class of the resultant ma-

    trix was colorful uint8 (three matrixes) and when we had to calculate the gradient, we

    needed a double class gray equivalent element in the matrix. Therefore, we had to write

    a small script which could convert a uint8 MATLAB class in a double one (you can

    find a direct function if you have the image processing toolbox, but we did not have it at

    the beginning of the period as said before, thats why we implemented this small script).Here is the script we wrote to make this transformation:

    Then we had to transform it in gray scales. We used these coefficients to have a good

    gray scaled image, where A1 is the colored image:A1=((A(:,:,1))*0.3+(A(:,:,2))*0.59+(A(:,:,3))*0.11);

    So we changed the three RGB matrixes in one gray equivalent matrix. The coefficients

    are chosen to respect the different contrast.

  • 7/28/2019 _DeBoisset_report.pdf

    37/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 36

    Figure 3.8: Returns of the graymat function for our new gesture database. We can

    see that the diagonal is black and that the other colors are much whiter (1 image perclass).

    After that, we could easily calculate the gradient of the image, but with some loss (we

    transformed a colorful picture in a gray equivalent picture -three matrixes to one-).

    Then, we bought the image processing toolbox, so we could just use the new MATLAB

    function.

    Second problem:

    The second problem was the image border. When we calculated the gradient magni-

    tude of the picture, the boarder were included in the calculation with a very high level,

    due to the consideration of the XRadius and the Y Radius. We can see it clearly on thepicture 3.9, where we calculated the gradient magnitude on a simple form (white triangle

    on black background):

    On this figure (3.9) , we can really see the noise of the borders in the picture. The

    gradient magnitude operation calculate the borders as a part of the image. That will bring

  • 7/28/2019 _DeBoisset_report.pdf

    38/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 37

    Figure 3.9: Gradient magnitude of a triangle with noise on borders

    lots of problems later in the second step: Gradient orientation calculation 3.2.2.

    Therefore we decided to cut the borders in the gradient magnitude calculation, otherwise

    all of our histograms would be similar.

    Third problem:

    When the background is not completely black and dark, we will have problems with

    the reflects and the contrasts for the gradient magnitude calculation. Indeed, we will have

    lots of noise that will be part of the gradient orientation vector. Therefore, in order to

    avoid this noise, we can apply a gaussian filter on the image. That will blur and soften

    the contrast. Then, we will have less high gradient magnitude noise. A complementary

    way to avoid this kind of noise is to take the picture on a really black background. We can

    see on the different pictures below 3.10 and 3.11 the differences between the gradients

    magnitude of each image and the the different histograms returned.

    We can also look the gradients magnitude images of these two pictures on the figure

    3.11.

  • 7/28/2019 _DeBoisset_report.pdf

    39/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 38

    (a) (b)

    Figure 3.10: 1 finger picture with black and gray backgrounds We can see the noiseof the gray background (a). That will bring lots of problems on the gradient calculation.

    With a good black background (b), we really simplify the problem.

    (a) (b)

    Figure 3.11: Gradients of 1 finger picture with black and gray backgrounds We can

    clearly see the white reflects of the gray background after the zoom (a), compared to the

    black background (b), even if we applied an important gaussian filter. The histogram willalso be deteriorated.

    We can notice that with a good black background, we have no troubles. We cut all the

    background noise. We will see later how we can do to resolve this recognition problem.

    Now we will compare the two histograms on the figure 3.12.

    To conclude with these few pictures, we can say that having an homogeneous black

  • 7/28/2019 _DeBoisset_report.pdf

    40/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 39

    (a) (b)

    Figure 3.12: Histograms of oriented gradients of 1 finger picture with black and gray

    backgrounds On these two graphs, we can notice that for the black background (b), the

    histogram is much more accurate than for the gray one (a). Therefore, it will be easier to

    treat. We need to have precise histograms to have a good gesture recognition

    background will make the work easier.We will have much more precise histograms, and

    the recognition result will be far better.

    3.2.7 Conclusion on the method

    After having implemented this method, we understand much more about images and what

    is scientifically behind an image. We can see that the results obtained are good but we

    could have think that they will be better. By implementing the next method, we will surely

    have new ideas to make this method more efficient. We will then test the method in the

    chapter 4.

  • 7/28/2019 _DeBoisset_report.pdf

    41/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 40

    3.3 PCA or Eigenfaces method

    We will now explain each part and detail the way we made the different algorithms.

    3.3.1 Step 1: Realize the database

    First of all, we had to choose how to make our database and what kind of database would

    be the best for the recognition. We choose to take the minimum of pictures to have the

    best recognition.

    It is important to notice that for the Eigenface method, we work with the entire pictures at

    beginning. Then we reduce the datas (our aim is to express the data set with less factors).Therefore, we must take care of the data set to decrease the calculation time. There are

    two parameters to include to realize the database:

    The number of pictures.

    The size of each picture, which will be part of the size of the first matrix to reduce.

    Both are really important. Indeed, when we will create the first matrix to reduce, its

    size will be the number of pixels by the number of pictures.

    Therefore, if there is too many pictures or too many pixels in each picture, the calculation

    time will grow fast!

    For example: 10 images with a definition of 640 480 will give a matrix, which size is10 307200. and 10 images with a definition of1280 960 will return a matrix size of10 1228800.So, as you can see here, it can easy and fast become a really huge matrix. The calculation

    time will then hardly depend of that. Moreover, we must not forget that MATLAB can

    not manage such big matrixes too.

    Therefore, the question of the database is a really important question, because it will

    then determine the efficiency of the method (and its calculation time too).

    At the beginning, we could not know how many pictures we had to take and what size wehad to choose. So we decided to make a database of 12 pictures (4 of each position) with

    a definition of640 480

    To choose what kind of database would give the best recognition, we realized some tests

    (after having implemented the method) of efficiency with different numbers and size of

    pictures in entrance.

  • 7/28/2019 _DeBoisset_report.pdf

    42/121

  • 7/28/2019 _DeBoisset_report.pdf

    43/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 42

    3.3.2 Step 2: Subtract the mean

    The next step is to calculate the mean of each direction. It is a fast step. We just had

    to take the first matrix of all the images, and then to ask to MATLAB to calculate the

    mean of the matrix. Then, we subtracted it to the first matrix. We do not have so many

    things to say about this step as it is really trivial. We must nor forget that this part is really

    important to center the data set pictures in the space.

    3.3.3 Step 3: Calculate the covariance matrix

    This step was a bit more difficult than the two first one (we had to well understand the

    theory to realize the calculation precisely).

    But once we understood the subtlety described in the second chapter 2, the calculation

    becomes fast and easy to implement. We can see on the picture 3.14 the different eigen-

    pictures returned by this covariance matrix.

    Figure 3.14: Example of the eigenpictures of the data set used for the PCA recogni-

    tion method

  • 7/28/2019 _DeBoisset_report.pdf

    44/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 43

    3.3.4 Step 4: Eigenvectors and eigenvalues calculation, and choose

    the good eigenvectors

    In this step, we will take a closer look on the calculation of the eigenvectors and eigenval-

    ues of the covariance matrix, and how to choose the good one.

    Indeed, it is really important to choose the good eigenvectors to express the data set with

    the best base. The number of eigenvectors choose will be in direct relations with the re-

    sults that we get.

    The value of the eigenvalues (between 0 and ) will determine if the eigenvector is

    important or not in the expression of the data set in the new space.

    Therefore, we thought we would have to realize a threshold on the eigenvalues to keep

    the most important eigenvectors. It is sure that it is really important to realize an efficient

    threshold to have the best results. At the beginning, we decided here to keep the 11th

    first eigenvectors (as described in the theory: it is 11 for 12 database pictures). But then,

    we planned to make some tests to know which distribution is the best (we had to find the

    value of threshold that give good results and that decrease the calculation time efficiently).

    But at the end, we decided to keep just three images in the data set (after having performed

    other tests in the chapter 4). Therefore, we decided to keep all the eigenvectors, because

    three eigenvectors is in any case really small.

    3.3.5 Step 5: Realize the new Data set and compare

    In this step, we will have to realize the new data set, by saving the new matrix of the

    eigenpictures and expressing each image of the database with the principal eigenvectors

    (we just have to realize a scalar product between the eigenvectors kept and the image). We

    will then save the coefficients that will be in front of each eigenvectors for each database

    image. Therefore, we will have as coefficients as eigenvectors.

    At the end, it is a way to express each image with the eigenvectors calculated. We will

    then express the image to analyze with these eigenvectors too. With these coefficients,

    we will be able to compare the images between them, by comparing the coefficients (we

    make the euclidian distance between each image coefficient). The results returned are

    quite good, as we can see in the chapter 4.

  • 7/28/2019 _DeBoisset_report.pdf

    45/121

    CHAPTER 3. IMPLEMENTATION AND EXPLANATION 44

    3.3.6 Conclusion on this method

    After having implemented this method, we understand how we can hardly compress with

    low loss a huge set of images. Moreover, we have seen an other view of image analysis

    and it is really interesting to compare the two kind of methods. It is what we will do in

    the next chapter (4): Tests, results and analyze. We can see that the results returned

    are quite good, but we can easily imagine that this method will be better with centered

    image, because the position in the picture of the gesture will be really important. We can

    understand that the second name for this method Eigenface is not an hazardous name. It

    is just that it should better work with faces than hands, because it will be easier to center

    a face in the picture (by centering the mouth and the eyes).

  • 7/28/2019 _DeBoisset_report.pdf

    46/121

    Chapter 4

    Tests, results and analyze

    In this chapter, we will explain the different tests made and the results returned. It will

    give a kind of tutorial of each method and then help people to choose one or the other

    method in function of the application they want to create. We will also explain the draw-

    backs of each method and the technical reasons of these drawbacks.

    So in a first part, we will see the application realized and we will give the complete script

    of the application. We will explain how to use the application too. After that, we will see

    the different choices made for each method and explain why we made these choices by

    performing tests. Then, we will make a simple comparison of the different methods and

    draw graphics of the results of each method in different conditions.

    4.1 The application: Rock Paper Scissors Game!

    After having implemented the different methods to realize the gesture recognition, we

    decided to implement a small application which would use the different methods.

    The first idea that comes to our mind was to realize a simple game that everybody knows:

    The Rock-Paper-Scissors Game!Indeed, it was the best way to test our gesture recognition script with fun.

    Moreover, as everybody knows the game, it is really easy and comfortable for other peo-

    ple to test the scripts and the recognition level of each method. The application has a GUI

    form for a better interface with the player. We tried to make an easy-to-use application,

    with very few things to do to realize the database or to play the recognition game.

    We can take a look on the Gui form shown on the figure 4.1.

    45

  • 7/28/2019 _DeBoisset_report.pdf

    47/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 46

    Figure 4.1: Photo of the application realized

    We can see on this screen shot the GUI form of the application and the overview of the

    different options proposed by the game.

    Then, we will see a photo of the environment constructed (PC and web camera environ-

    ment) to take good pictures for a better analysis. Moreover, it is important to know which

  • 7/28/2019 _DeBoisset_report.pdf

    48/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 47

    environment we used to realize these pictures, because it is in direct relations with the

    results returned.The black background for example is really important. We can see that the environment

    looks quite basic. Indeed, it will be pretty easy for someone to make its own working

    space and use this script.

    Then we will explain each button, and what the application can do.

    But first, we can watch the working space on the figure 4.2.:

    Figure 4.2: Photo of the working space realized for the gesture recognition applica-

    tion

    We can see on this photo the working space realized for the gesture recognition. We

    used a Philips camera, with a tripod. We cut a wood board and painted it in black to

    have a better background. We can just say that paint is surely not the best way to havean uniform background, but that was what we had. Indeed, even we took a matt paint,

    the different lighting set are directly reflecting on the paint and therefore influencing the

    results. Thats why we make some different work on the pictures before analysis. The

    best background would be a textile, because it is much more matt.

    So far, everything is easy to realize. What is a bit more hazardous is to have a camera

    recognized by MATLAB. We were lucky and we had one.

    Now we can see the application in details on the figure 4.3. We are going to explain

  • 7/28/2019 _DeBoisset_report.pdf

    49/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 48

    each function and why we realized it like that.

    Figure 4.3: Explanation of the application realized

    So, we will explain more precisely the different buttons and their utility:

    The Start/Stop button: As indicated, it is made to start or stop the cam. The aim isto have more memory space by stopping the cam when the application is not used.

    It will also give the preview of your gestures. The Start button must have been

    pressed before starting any comparison.

    The preview window: This window will just be used for the preview of your gesture.

    You just have to click on the start button to have the preview.

    The nearest database picture window: This small window is just made to indicate

    which image is recognized in the data set. It is really useful when you have several

  • 7/28/2019 _DeBoisset_report.pdf

    50/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 49

    pictures of each gesture. Indeed, you will know which picture is recognized and it

    will help you to understand how the recognition works.

    The Text field: It is made to say the different indications to the user. So, it will be

    really useful for the data set creation. It directly says which gesture you have to do,

    how long you have to wait or even if you won or loosed.

    The confirmation for recognition buttons: These buttons are especially made for test-

    ing. After having launched a method and seen the results, you can click on yes or

    no to say if your gesture is recognized or not. The aim is that the application will

    automatically count how many gestures were found or not. It is really appreciable

    for long test series.

    The Players and computers position windows: These two windows are made to

    see the picture analyzed for recognition. It shows the picture of the gesture you

    just made and the random computers gesture. It is to have a quick overview of the

    results and to make the game more attractive.

    The Text field for the score: Here, you can see the score between the user and the

    computer, and the user will also read the gesture recognized. When Scissors

    against Rock is written, that means that the application recognized a scissors posi-

    tion for the user, and that the computer gesture (random) is a Rock. In fact, the first

    gesture written will be the position of the nearest data set picture recognized.

    The Reset Score button: It is just made to reset the score in a simple way.

    The Load Eigenface Matrix Button: When you launch the application, to avoid you

    to wait to the loading of these huge matrixes in case you just want to use the gradient

    histogram method, you can load these matrixes whenever you want. You must just

    know that the Eigenface method will not work before you loaded these matrixes. It

    is made to access to the GUI quicker.

    The Game Database Creation button: It is made to create a new database. It will

    take pictures of the user and calculate all the matrixes automatically. The aim of

    this button is to realize a new set of pictures (database) for each user. The resultswill be better when the user makes its own set.

    The Compare with Eigenface button: It will simply launch the script that analyzes

    the new picture with the Eigenface method. You can not use this before you loaded

    the Eigenface matrixes. Once the matrixes are loaded, the method goes really fast.

    The Compare with Gradients button: It will just launch the gradient histogram method.

    This method is a bit longer as the Eigenface one, but you do not have to wait for the

    loading of matrix. You can directly use this method once the GUI is opened.

  • 7/28/2019 _DeBoisset_report.pdf

    51/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 50

    The Compare with simple sub button: This button will launch the first method im-

    plemented: the simple subtraction one. This method is in the game just to show thatit is not good recognizing. You can just test it to have an overview of the results.

    So, we have explained the different buttons of the application and what there are do-

    ing. We have also seen the working space realized for this project. Now, it is important to

    see which results we obtained for each method.

    If you are interested by the script itself, then take a look on the Appendix C.

    4.2 Test and choices of the parameters

    In this section, we will take a closer look on the different tests made to explain our choices

    in the different methods scripts.

    During this project, we had to do multiple choices that have influenced the results. We

    made tests to approve these choices, so that we are sure that the different way that we took

    just go in favor of better results.

    For both methods, we realized some diagrams to explain the results.

    4.2.1 Choice of the size of the derivative filter and the number of box

    for the gradients method

    It is important to notice that all the pictures had the same size before doing any com-

    parison. We just had pictures that made 640 480. We fixed the size of the imagesbecause it will influence on the results of the tests -the size of the derivative filter is

    in pixels, so having a circle of 3 pixels on an image that is 50 30 will not have thesame effect as having the same circle on a 640 480 picture-. Thats why we fixed the

    image size.

    In this part, we will see how we choose the size of the derivative filter for the gradient

    method. As well, we will also see how we choose the number of bins (or boxes) to count

    the different orientation for the histogram. We can notice, that in all of our case, we

    choose a circle derivative filter.

    Before seeing the different graphics, we have to say that we realized these tests with

    different positions. Indeed, we made these tests at the beginning of the project, and at

  • 7/28/2019 _DeBoisset_report.pdf

    52/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 51

    this time, the positions to recognize were the 5 positions of the hand between the 1 and

    the 5. It is a good thing that the tests are made on these position, because in this case itwill give more information (the position are more precise and more difficult to recognize,

    so the influence of the number of box or of the derivative filter size will also have more

    impact).

    We can first look to the graphics of the euclidian distance between gestures in the same

    class (1 to 5) on the figure 4.4. It is in function of number of bins and derivative filter size.

    Figure 4.4: Graphic of the euclidian distance between the 1 themselves This graphic

    represent the draw of euclidian distance in y-axis and the number of box with the size of

    the derivative filter in the x-axis -it goes from 18box filter 3, then 18box filter 6, 18 box

    filter 12, 36box filter 3 ... to 72box filter 12-.

    So, as we can see on this picture, the lowest euclidian distance is for 36 bins. Then if

    we make a mean of the different distance intra-class, we can see that the best choice is to

    choose a derivative filter of 6, to obtain a minimum euclidian distance.

    But, to confirm that, we have to take a look on the figures 4.5 and 4.6, that shows the other

    position distances between themselves.

  • 7/28/2019 _DeBoisset_report.pdf

    53/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 52

    (a) (b)

    Figure 4.5: Graphics of the euclidian distance between the 2 and the 3 themselves

    These two graphics represent the euclidian distance in y-axis and the number of box with

    the size of the derivative filter in the x-axis as for the first graphic.

    (a) (b)

    Figure 4.6: Graphics of the euclidian distance between the 4 and the 5 themselves

    Same graphics than the first one.

    So with these graphics, we can definitely say that to have the lowest euclidian distance

    in a class between position of the same class, we have to choose 36 bins. And that is what

    we will choose for the application. Now, we will observe the graphics of the euclidian

    distance between a class and another. We can watch the 1 against the other classes on

    the figure 4.7.

    Here, what is important to notice is not the highest euclidian distance, but the high

  • 7/28/2019 _DeBoisset_report.pdf

    54/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 53

    Figure 4.7: Graphic of the euclidian distance between the 1 class and the other

    classes This graphic represent the draw of euclidian distance in y-axis and the otherclasses in x-axis. Each curve is for a number of box with a size of the derivative filter.

    of the euclidian distance in a class (intra-class) in comparison with the euclidian distance

    between the classes.

    We will just watch 1 graphic, because all the graphics look the same. We decide to choose

    the first one (class 1 against the other).

    We can see that between the 1 position and the 2 position, we have a mean in euclidian

    distance around 0.4 (All the distance are normalized, so the maximum distance that we

    can have between two classes is 2 -when the two normalized vectors are opposite-).

    We can see that between the 1 position (in intra-class), the euclidian distance is alsoaround 0.4.

    Moreover, we can see that for the others positions, the distance between the different

    classes (inter-class) and the intra-class distance are similar.

    What does it mean?

    It means that our set of pictures is too close. Our different images are for sure too close,

    because the distance between the classes are the same as our distance in the classes.

    Therefore, we changed the data set and we took other pictures that are more far between

  • 7/28/2019 _DeBoisset_report.pdf

    55/121

    CHAPTER 4. TESTS, RESULTS AND ANALYZE 54

    them when we change the class. We founded that the positions Rock, Paper and Scissors

    were corresponding to what we wanted, as well as it could do a good application.

    In any way, we can still say that the best is 36 bins, because it allows in any case to

    have better results. We can see that with 18 bins, the different histograms are too close,

    because too many orientations are in the same bin, and that for 72 bins, the different ori-

    entations are too spread. So these orientation histograms are not good. In one case, we

    will have big peeks and in the other case, we will have a too regular histogram.

    Now, to confirm that a circle derivative filter of 6 will be the best, we made other test