MACHINE LEARNING APPROACH FOR HEAD GESTURE …

9
© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162) JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 175 MACHINE LEARNING APPROACH FOR HEAD GESTURE CLASSIFICATION Indhumathi R #1 , Geetha A *2 #1 Research Scholar ,Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamilnadu, India *2 Associate Professor ,Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar, Tamilnadu, India Abstract: This proposed work aims at developing real time approach to interact with computers. The human computer interaction is a fast growing research area at present. The purpose of this hands-free computer interfaces to replace the traditional input devices such as keyboard and mouse with vision-based interfaces. This work is also more useful for the people who are having physical disabilities. The proposed work recognizes head gestures in real time. In this work the head movements are detected by rotation invariant approach and Haar cascaded classifiers. The detected head gestures are recognized as head-left, head-right, head-down and head-origin using SVM and KNN classifiers under machine learning approach. The Hog features are extracted for models. The proposed system has been tested with real time dataset and it performs better than the other existing approaches. Keywords: Hands- free Interfaces, Vision-based Interfaces, SVM, KNN, Haar Cascade Classifier, HOG I.INTRODUCTION 1.1 HUMAN COMPUTER INTERACTION (HCI): Human computer interaction (HCI) refers to interaction between human and computer. HCI surfaced in the 1980s with the advent of personal computing, just as machines such as the Apple Macintosh, IBM PC 5150 and Commodore 64 started turning up in homes and offices in society-changing numbers[1].Traditionally the Humans interact with computers by using keyboard, mouse and many peripherals. Nowadays the fast growing technologies replace the traditional input devices. The devices replaced hand, eye, head, finger etc., Machine vision provides an excellent way of realizing such an intelligent human-computer interface that enables a user to control a computer without any physical contact with devices such as keyboards, mice and displays. In this proposed work head movements are used to interact with the computer. 1.2 HEAD MOVEMENTS: The capacity to estimate the head pose of another person is a common human ability that presents a unique challenge for computer vision systems. A number of head pose techniques are used in existing work. The Haar cascaded classifier detects only frontal face of the face. In this proposed work rotation invariant algorithm is used to detect left, right, origin and down. 1.3 COMPUTER VISION: Computer vision is importing human intelligence and instincts to computer. Computer vision is interrelated with artificial intelligence to recognize input images and stimulate computer to act like human. In this work the head gestures are converted as image frames and interfaced with the computer. II. RELATED WORKS Anwar saeed, Ayoub AI Hamadi and Ahmed Ghonein [2] proposed a frame based approach to estimate head pose. The Viola and Jones Haar like face detector is used to estimate the head pose. SVM classifier is used which gives performance of 75% accuracy. Rushikesh T.Bankar and Suresh S.Salankar [3] developed

Transcript of MACHINE LEARNING APPROACH FOR HEAD GESTURE …

Page 1: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 175

MACHINE LEARNING APPROACH FOR HEAD

GESTURE CLASSIFICATION

Indhumathi R#1, Geetha A*2 #1 Research Scholar ,Department of Computer and Information Science, Annamalai University,

Annamalai Nagar, Tamilnadu, India *2 Associate Professor ,Department of Computer Science and Engineering, Annamalai University,

Annamalai Nagar, Tamilnadu, India

Abstract:

This proposed work aims at developing real time approach to interact with computers. The human computer interaction is a fast growing

research area at present. The purpose of this hands-free computer interfaces to replace the traditional input devices such as keyboard and mouse

with vision-based interfaces. This work is also more useful for the people who are having physical disabilities. The proposed work recognizes

head gestures in real time. In this work the head movements are detected by rotation invariant approach and Haar cascaded classifiers. The

detected head gestures are recognized as head-left, head-right, head-down and head-origin using SVM and KNN classifiers under machine

learning approach. The Hog features are extracted for models. The proposed system has been tested with real time dataset and it performs better

than the other existing approaches.

Keywords:

Hands- free Interfaces, Vision-based Interfaces, SVM, KNN, Haar Cascade Classifier, HOG

I.INTRODUCTION

1.1 HUMAN COMPUTER INTERACTION (HCI):

Human computer interaction (HCI) refers to interaction between human and computer. HCI surfaced in the

1980s with the advent of personal computing, just as machines such as the Apple Macintosh, IBM PC 5150

and Commodore 64 started turning up in homes and offices in society-changing numbers[1].Traditionally

the Humans interact with computers by using keyboard, mouse and many peripherals. Nowadays the fast

growing technologies replace the traditional input devices. The devices replaced hand, eye, head, finger

etc., Machine vision provides an excellent way of realizing such an intelligent human-computer interface

that enables a user to control a computer without any physical contact with devices such as keyboards, mice

and displays. In this proposed work head movements are used to interact with the computer.

1.2 HEAD MOVEMENTS:

The capacity to estimate the head pose of another person is a common human ability that presents a unique

challenge for computer vision systems. A number of head pose techniques are used in existing work. The

Haar cascaded classifier detects only frontal face of the face. In this proposed work rotation invariant

algorithm is used to detect left, right, origin and down.

1.3 COMPUTER VISION:

Computer vision is importing human intelligence and instincts to computer. Computer vision is interrelated

with artificial intelligence to recognize input images and stimulate computer to act like human. In this work

the head gestures are converted as image frames and interfaced with the computer.

II. RELATED WORKS

Anwar saeed, Ayoub AI Hamadi and Ahmed Ghonein [2] proposed a frame based approach to estimate

head pose. The Viola and Jones Haar –like face detector is used to estimate the head pose. SVM classifier is

used which gives performance of 75% accuracy. Rushikesh T.Bankar and Suresh S.Salankar [3] developed

Page 2: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 176

a system to recognize head gestures in real time. The gestures are recognized from the video sequences.

Alvise Memo Pietro Zanuttigh [4] presented the combined usage of an head-mounted display and a multi-

modal sensor setup. Reliable gesture recognition is obtained through a real-time algorithm exploiting novel

feature descriptors arranged in a multi-dimensional structure fed to an SVM classifier. Shubhada P

Deshmukh, Manasi S.Patwardhan and Anjali R.Mahajan [5] proposed a head gesture recognition system

that is performed by using SVM classifier. This work provides required time efficiency and good accuracy

of 97% for facial expression and 98% accuracy for head gestures. Rushikesh T.Bankar and Suresh

S.Salankar [6] proposed system based on a sequence of data acquisition that is image capturing, pre

processing means filtering, feature extraction that is rectangular features and a parallel stage with a cascade

of classifiers design and classification. Va Zen-ping Bian,Junhui hou,et al [7] used non linear function

mapping purpose.The Model used FM-Facial Position and Expression Mouse system.

III. METHODOLOGY

3.1 SYSTEM DESIGN:

This proposed work has been designed with four stages as shown in Fig 1. Initially, the proposed work

detects head gestures from web camera. Second HOG features are extracted from head gestures. Third, the

detected head gestures are trained using SVM and KNN using feature vector. Finally the SVM and KNN are

tested with the given samples and the performance is analyzed.

i. Head gestures detection

ii. Feature Extraction

iii. Training

iv. Testing

v. Performance Analysis

Fig 1 : Flow diagram of the proposed work

Page 3: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 177

3.1.1 DETECTION OF HEAD GESTURES:

Head gestures are the position made by the head movement.[7].The rotation invariant and Haar cascaded

approaches are used to detect head gestures such as head left, head right, head origin and head down as

shown in Fig 2.

Fig2a: Head Orgin Fig 2b: Head Down Fig 2c: Head Right Fig2d: Head Left

Haar Classifier:

Haar-like features [8], [9] are certain features in a digital image which are used in object detection. They are

named so on account of their similarity with Haar wavelets. The Haar-like features are used in real-time

object detection. The algorithm was first developed by Viola et. al. [8], [9] and was later extended by

Lienhart et. al. [10].

Rotation Invariant Approach:

1) The Haar classifier checks for the presence of frontal faces in the input image.

2) If frontal face is found it detects the face.

3) If frontal face is not found, a rotational matrix is formed with 𝑹 with 𝜽 = ±𝟑𝟎o , to transform the image.

The angle of 𝜽 = ± 𝟑𝟎o is chosen because the Haar classifier in this case has been developed with inherent

rotational features up to ±𝟑𝟎o .

4) The Haar classifier searches for the face in the transformed image.

5) If for any of the set values of 𝜽, the Haar classifier does not detect the face, the algorithm concludes that

no face is found.

3.1.2 FEATURE EXTRACTION:

Histogram of Oriented Gradients (HOG)

A histogram of oriented gradients (HOG) is used in image processing applications for detecting objects in a

video or image, which by definition is a feature descriptor [11].The Hog features are extracted as shown in

Fig 3.The original and HOG images are shown in Fig 4 and Fig 5 respectively.

Page 4: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 178

Fig 3: Flow diagram of HOG feature Extraction

Fig 4: Original Image Fig 5: HOG Image

3.1.3 TRAINING:

The head gestures are captured from web camera and labeled as four types such as head-left, head-right,

head-origin, head-down. About 1200 training samples such as head-left 300, head-right 300, head-origin

300, head-down 300 are used for training. Every frame is resized as 150 x 150 and all images are converted

as gray scale for better performance.

KNN:

KNN consider the similarity of two points to be the distance between them in this space under some

appropriate metric. The way in which the algorithm decides which of the points from the training set are

similar enough to be considered when choosing the class to predict for a new observation is to pick the k

closest data points to the new observation, and to take the most common class among these. This is why it is

called the k Nearest Neighbors algorithm.

The algorithm (as described in [12] and [13]) can be summarized as:

1. A positive integer k is specified, along with a new sample

2. We select the k entries in our database which are closest to the new sample

3. We find the most common classification of these entries

Page 5: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 179

4. This is the classification we give to the new sample

SVM:

Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both

classification or regression challenges. However, it is mostly used in classification problems. In Python,

scikit-learn is a widely used library for implementing machine learning algorithms, SVM is also available in

scikit-learn library and follow the same structure (Import library, object creation, fitting model and

prediction) [14].

(i) SVM Principle

Support vector machine (SVM) as shown in Fig 6 can be used for classifying the obtained data

(Burges,1998). SVM are a set of related supervised learning methods used for classification and regression.

They belong to a family of generalized linear classifiers. Let us denote a feature vector (termed as pattern)

by x=(x1, x2, · · · , xn) and its class label by y such that y = {+1,−1}. Therefore, consider the problem of

separating the set of n-training patterns belonging to two classes,

Fig 6: Architecture of SVM

A decision function g (x) that can correctly classify an input pattern x that is not necessarily from the

training set.

3.1.4 TESTING:

About 200 samples such as head-left 50, head-right 50, head-origin 50, head-down 50 are tested. The video

frames are resized as 150 x 150 and converted into gray image. The SVM and KNN model classify the

frames as head-left, head-right, head-origin and head down.

IV. PERFORMANCE ANALYSIS

The proposed method is implemented in windows 10 using python 3.4 and opencv 3.3.0 environments. The

performance of the proposed system with SVM and KNN models are measured using confusion matrix and

the following performance metrics. The SVM provides 99% and KNN provides 98%.The result of

performance analysis of SVM and KNN is produced as below.

Page 6: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 180

4.1. CONFUSION MATRIX:

A confusion matrix is a table that is often used to describe the performance of a classification model (or

"classifier") on a set of test data for which the true values are known. The proposed system is tested with 50

samples for each gesture. The result of confusion matrix for KNN and SVM is shown in table 1 and table 2.

Table 1: Confusion Matrix for KNN

Table 2: Confusion Matrix for SVM

4.2 CLASSIFICATION REPORT:

It is used to find out how effective is the model based on some metric using test datasets. Different

performance metrics are used to evaluate different Machine Learning Algorithms. Here the metric for

evaluation of machine learning algorithms used are Precision, Recall, F-score and Accuracy. The

classification report for KNN and SVM is given in table 3 and table 4.

Table 3: Classification Report for KNN

Precision

(%)

Recall

(%)

F1-Score

(%)

Accuracy

(%)

Head-Left 76 76 76 88

Head-Right 88 91 89 95

Head-Origin 54 84 66 86

Head-Down 98 70 82 89

Avg/Total 79 80 78 90

Page 7: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 181

Table 4: Classification Report for SVM

Precision

(%)

Recall

(%)

F1-Score

(%)

Accuracy

(%)

Head-Left 80 88 88 93

Head-Right 82 82 82 91

Head-Origin 70 83 76 89

Head-Down 92 73 81 90

Avg/Total 81 82 82 91

V. RESULT AND DISSCUSSION

The preprocessed training samples are given to the classifier. The hog feature is used for feature extraction.

The extracted feature fit into the SVM and KNN models. The classifier predicts the labels. The SVM

classifier produces 91% accuracy where KNN produces 90% accuracy. The proposed method recognizes

head-left and head-right gestures better than head-origin and head-down gestures.

The performance of the proposed system is analyzed using the metrics and the results are shown.

OUTPUT:

Fig 6a: Testing Result Head-Left Fig 6a: Testing Result Head-Right

Page 8: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 182

Fig 6c: Testing Result Head-origin Fig 6d: Testing Result Head-down

VI. FUTURE WORK AND COCLUSION

In this work the head recognition is performed by using Hog feature with two models such as KNN and

SVM. Both KNN and SVM produce good accuracy. In future other models and features can be considered

for head gestures classification.

REFERENCES:

[1] https://www.interaction-design.org/literature/topics/human-computer-interaction

[2] Anwar saeed, Ayoub AI Hamadi and Ahmed Ghonein,” Head pose estimation on top of Haar-like face Detection:A study using the kinect sensor”, ISSN1424-

8220,Sensor,vol.15, pp 20945-20966, 2015

[3] Rushikesh T.Bankar and Suresh S.Salankar,” Head Gesture Recognition System Using Adaboost Algorithm with Obstacle Detection”, 7th International

Conference on Emerging Trends in Engineering & Technology, IEEE, 2015.

[4] Alvise Memo Pietro Zanuttigh,” Head-Mounted Gesture Controlled Interface for Human-Computer Interaction”, Multimed Tools Appl,Springer Science +

Business Media Newyork, 2018.

[5] Shubhada P Deshmukh, Manasi S.Patwardhan and Anjali R.Mahajan,” Feedback Based Real Time Facial and Head Gestures Recognition for E-Learning

System”, Association for Computing Machinery.CoDS-COMAD’18,January 11-13,Gao,India , 2018.

[6] Rushikesh T.Bankar and Suresh S.Salankar,” Head Gesture Recognition System using Adaboost Algorithm with Obstacle Detection”, 7th International

Conference on Emerging Trends in Engineering & Technology, IEEE, 2015.

[7] Bian,Junhui Hou, Lap-Pui Chau,and Nadia Magnenat-Thalmann ,"Facial Position and Expression Based HumanComputer Interface for Persons with

Tetraplegia" , Institute of Electrical and Electronics Engineers(IEEE),2015.

[8] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR

2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1. IEEE, pp. I–511, 2001.

[9] P. Jones, P. Viola, and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in University of Rochester. Charles Rich. Citeseer, 2001.

[10] R. Lienhart and J. Maydt, “An extended set of haar-like features for rapid object detection,” in Image Processing International Conference on, vol. 1. IEEE,

pp. I–900, 2002.

[11] Dalal N, Triggs B “Histogram of oriented gradients for human detection”, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision

and Pattern Recognition (CVPR), New York: IEEE, pp. 63-69,2005.

Page 9: MACHINE LEARNING APPROACH FOR HEAD GESTURE …

© 2018 JETIR December 2018, Volume 5, Issue 12 www.jetir.org (ISSN-2349-5162)

JETIR1812C24 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 183

[12]http://www.math.le.ac.uk/people/ag153/homepage/KNN/KNN3.html

[13] http://www.lkozma.net/knn2.pdf

[14]https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/

.