Modelling and Implementation of Face Detection and ... · Modelling and Implementation of Face...
Transcript of Modelling and Implementation of Face Detection and ... · Modelling and Implementation of Face...
Modelling and Implementation of FaceDetection and Recognition for MobilityAssistant for Visually Impaired System
(MAVI)
A thesis submitted in partial fulfillmentof the requirements for the degree of
MASTER OF TECHNOLOGY
in
VLSI Design Tools & Technology
by
MUNIB FAZALEntry No. 2014JVL2694
Under the guidance of
PROF. M. BALAKRISHNANDr. CHETAN ARORA(IIIT Delhi)
Department of VLSI Design Tools & Technology ,Indian Institute of Technology Delhi.
June 2016.
Certificate
This is to certify that the thesis titled Modelling and Implementation of
Face Detection and Recognition for Mobility Assistant for Visually
Impaired System (MAVI) being submitted by MUNIB FAZAL for the
award of Master of Technology in VLSI Design Tools & Technology
is a record of bona fide work carried out by him under my guidance and
supervision at the Department of Computer Science & Engineering .
The work presented in this thesis has not been submitted elsewhere either in
part or full, for the award of any other degree or diploma.
PROF. M. BALAKRISHNAN
Department of Computer Science and Engineering
Indian Institute of Technology, Delhi
Dr. CHETAN ARORA
Department of Computer Science and Engineering
Indraprastha Institute of Information Technology, Delhi
“Dedicated to memory of my father.”
Abstract
The thesis presents the design space exploration, implementation of OpenCV
Face Detection along with Face Recognition for MAVI (Mobility Assistant
for visually Impaired), an outdoor navigation system for helping visually im-
paired (acronym VI will be used) person to socialize , being aware of people
around him and recognize them.
The work involves measuring accuracy, performance and power/energy on
Zedboard using its arm cores and configuration of various parameters to
OpenCV face detection functions, cross compilation and building of environ-
ment for Zedboard arm development, profiling algorithm and exploring tools
and technique for hardware acceleration of the algorithm. The exploration
had been done mindful of other processes and application that will be run-
ning alongside with this application and provide useful parameters to a future
controller for controlling performance and energy consumption according to
the circumstances and needs of a VI person.The use of Upper body Detection
is also included for reducing the computation time and complete flow to face
recognition is done and presented.
The embedded applications can be better implemented using Xilinx Zynq All
Programmable System-on-Chip device as it has ARM processor core along
with the programmable FPGA fabric on the same chip. This approach facil-
itates the acceleration based implementations which can further optimized
for performance.The future work involves hardware acceleration of proposed
functions using Xilinx SDSOC and integration with other modules being
built for MAVI.
Acknowledgments
I take this opportunity to thank my supervisor Prof. M. Balakrishnan for his
constant supervision and valuable guidance during the course of thesis. I am
highly indebted to him for believing in me and for being a constant source
of motivation.
I would also like to thank our co-supervisor Dr. Chetan Arora for his valuable
insights and assistance regarding subject of Computer Vision, Prof. Anshul
Kumar for all his help and support as the program coordinator of VDTT,
IIT Delhi, Mr. Rajesh Kedia and Mrs. Radhika for having useful discussions
and providing us with ideas, Mr. Sourajit Jash for providing initial help and
orientation on the thesis work, Mr. Sharma and Mr. Rakesh for providing
me with all the lab equipments and support.
My heartfelt thanks to Yoosuf KK and Hassen Basha for making my thesis
journey fun filled and memorable,Siva Krishna Aleti and Akshay Jain for
their kind gesture and making lab environment joyful, Saurabh Agrawal for
assisting me in Documentation and result evaluations.
Last but not the least I would like to thank GOD for being merciful and my
parents for their constant prayers and well wishes for me.
MUNIB FAZAL
Contents
1 Introduction 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Mobility Assistant for Visually Impaired (MAVI) . . . . . . . 1
1.3 Face Detection and Recognition . . . . . . . . . . . . . . . . . 3
1.3.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.3 Integral Image . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.4 Cascade of Classifiers . . . . . . . . . . . . . . . . . . . 6
1.3.5 Face Recognition . . . . . . . . . . . . . . . . . . . . . 7
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Modelling and Implementation of Face Detection 9
2.1 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 APIs used in Face Detection and Recognition . . . . . 9
2.2 Parameters tuning . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Scale Factor & MinNeighbour . . . . . . . . . . . . . . . . . . 13
2.4 Minimum Face Size . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 FR Database Size . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Upper Body and Face Detection . . . . . . . . . . . . . . . . . 15
2.7 Creation FR Database and Training . . . . . . . . . . . . . . . 16
3 Embedded Face Detection and Recognition 18
3.1 ZedBoard-Choice for prototyping . . . . . . . . . . . . . . . . 18
3.2 Linaro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Energy and power measurement on ZedBoard . . . . . . . . . 20
c© 2016, Indian Institute of Technology Delhi
3.4 Cross Compilation for ZedBoard . . . . . . . . . . . . . . . . . 21
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Hardware Software Codesign 31
4.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 SDSOC . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 VIVADO HLS . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Profiling & Code Modification . . . . . . . . . . . . . . . . . . 34
4.3 Hardware Acceleration . . . . . . . . . . . . . . . . . . . . . . 36
5 Conclusion & Future Work 37
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Bibliography 38
List of Figures
1.1 MAVI System . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 MAVI System and its modules . . . . . . . . . . . . . . . . . . 2
1.3 Face Detection and Recognition Module . . . . . . . . . . . . 3
1.4 Haar Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Original image . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Integral Image or Sum Area Table . . . . . . . . . . . . . . . . 6
2.1 Face Detection + Upper Body . . . . . . . . . . . . . . . . . . 16
2.2 Face Recognition Database . . . . . . . . . . . . . . . . . . . . 17
3.1 Power measurement setup . . . . . . . . . . . . . . . . . . . . 20
3.2 Face Detection Accuracy . . . . . . . . . . . . . . . . . . . . . 22
3.3 Face Detection-False Positives . . . . . . . . . . . . . . . . . . 23
3.4 Face Detection-Average time taken . . . . . . . . . . . . . . . 23
3.5 Face Detection-Computation Time(SF-1.05) . . . . . . . . . . 24
3.6 Face Detection-Computation Time(SF-1.1) . . . . . . . . . . . 24
3.7 Face Detection-Computation Time(SF-1.15) . . . . . . . . . . 25
3.8 Face Detection-Computation Time(SF-1.2) . . . . . . . . . . . 25
3.9 Face Detection-Computation Time(SF-1.25) . . . . . . . . . . 26
3.10 Face Detection-Computation Time(SF-1.3) . . . . . . . . . . . 26
3.11 Face Detection-Computation Time(SF-1.35) . . . . . . . . . . 27
3.12 Face Detection-Computation Time(SF-1.4) . . . . . . . . . . . 27
3.13 Face Detection-Board Energy . . . . . . . . . . . . . . . . . . 28
3.14 Face Detection-Component Energy . . . . . . . . . . . . . . . 28
3.15 Face Detection-distance . . . . . . . . . . . . . . . . . . . . . 29
3.16 Face Detection-upscaling . . . . . . . . . . . . . . . . . . . . . 29
c© 2016, Indian Institute of Technology Delhi
3.17 Face size variation with distance . . . . . . . . . . . . . . . . . 30
4.1 Face Detection Accuracy . . . . . . . . . . . . . . . . . . . . . 35
4.2 Face Detection Accuracy . . . . . . . . . . . . . . . . . . . . . 36
List of Tables
2.1 Accuracy vs Database Size . . . . . . . . . . . . . . . . . . . . 14
3.1 Power Measurement . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 1
Introduction
1.1 Problem Definition
Real-time video processing is increasingly becoming an important applica-
tion. An interesting task that is critical in not only security and surveillance
but also in many other applications is the problem of face detection and
recognition. We are particularly interested in developing an assisting device
for visually impaired that can help him/her in detection and recognizing
familiar faces while walking.The objective is to implement a real-time, on-
line, energy-optimal face detection and recognition algorithm by hardware-
software co-design.
1.2 Mobility Assistant for Visually Impaired
(MAVI)
The objective of MAVI System is to conceptualize and design a smart-camera
based vision system, capable of extracting useful information, e.g.faces, tex-
ture, signboard information from captured images. It consist of different
modules integrated onto one platform,communicating to user through mobile
interface and using cloud to translate the coordinates sent by Localization
module into navigational information that can be annotated to signboard de-
tected in a frame captured by the camera. The modules in MAVI consist of
Face Detection and Recognition, Texture Recognition, Signboard Detection
and Localization.
c© 2016, Indian Institute of Technology Delhi
Figure 1.1: MAVI System
Figure 1.2: MAVI System and its modules
1.3 Face Detection and Recognition
This module consist of two sub modules of Face Detection and Face Recog-
nition.The frame capture from the camera is initially sent to Face Detection
System which will detect the faces in a frame,then crop them and resize it
according to the requirements of Face Recognition System.The FR system
will then recognize the face to match in its database of faces.
Face Detection and Recognition System
6
VGA Camera
Face Detection Module
Face Recognition
Module
FR training application
FR predict application
Fisher face recognizer
from OpenCV
Cropped detected
faces
FD Application
Haar Cascade Classifier of
OpenCV
frames
Face Database
YML file
Figure 1.3: Face Detection and Recognition Module
1.3.1 Face Detection
Object Detection using Haar feature-based cascade classifiers is an effective
object detection method proposed by Paul Viola and Michael Jones in their
paper, ”Rapid Object Detection using a Boosted Cascade of Simple Fea-
tures” in 2001. It is a machine learning based approach where a cascade
function is trained from a lot of positive and negative images. It is then
used to detect objects in other images.Initially, the algorithm needs a lot of
positive images (images of faces) and negative images (images without faces)
to train the classifier. Then we need to extract features from it. For this,
haar features shown in below image are used. They are just like our convo-
lution kernel. [1].For this project we had used already trained OpenCV Haar
Cascade Classifier consisting of 20 stages,1047 features.
1.3.2 Features
[2] The Viola-Jones algorithm uses Haar-like features, that is, a scalar prod-
uct between the image and some Haar-like templates.Each feature is a single
value obtained by subtracting sum of pixels under white rectangle from sum
of pixels under black rectangle.
let I and P denote an image and a pattern, both of the same size NXN.The
feature associated with pattern P of image I is defined by :∑16i6N
∑16j6N
I(i, j)1P (i, j) is white−∑
16i6N
∑16j6N
I(i, j)1P (i, j) is black
The derived features are assumed to hold all the information needed to char-
acterize a face. Since faces are by and large regular by nature, the use of
Haar-like patterns seems justified.
Figure 1.4: Haar Features
1.3.3 Integral Image
There is another crucial element which lets this set of features take prece-
dence: the integral image which allows to calculate them at a very low com-
putational cost. Instead of summing up all the pixels inside a rectangular
window, we can use integral image. Integral image is obtained from original
image by taking sum of all the pixels above that pixel and sum of all the pix-
els on the left of that pixel. This can be computed in one pass of an image.
Integral image can then be used to compute sum of pixels in a window in
a constant time. Integral Image can be computed using equation , if i(x, y)
represents pixel in image and s(x, y) represents integral image pixel point
then
s(x, y) = i(x, y) + s(x− 1, y) + s(x, y − 1) − s(x− 1, y − 1)
.
Figure 1.5: Original image
Figure 1.6: Integral Image or Sum Area Table
The sum of pixels covered in a window can be computed using formula, if
A,B,C,D represents the 4 corners of rectangular window ,then sum of pixels
in the window is
sum = s(A) + s(D) − s(C) − s(B)
1.3.4 Cascade of Classifiers
In an image, most of the image region is non-face region. So it is a better
idea to have a simple method to check if a window is not a face region. If it
is not, discard it in a single shot. Don’t process it again. Instead focus on
region where there can be a face. This way, we can find more time to check
a possible face region.
For this the concept of Cascade of Classifiers was introduced. Instead of
applying all the features on a window, group the features into different stages
of classifiers and apply one-by-one. (Normally first few stages will contain
very less number of features). If a window fails the first stage, discard it. We
don’t consider remaining features on it. If it passes, apply the second stage
of features and continue the process. The window which passes all stages is
a face region.
1.3.5 Face Recognition
In order to understand the methods for recognizing faces, more advanced
mathematical knowledge is required; namely linear algebra and statistics.
OpenCV provides three methods of face recognition: Eigenfaces, Fisherfaces
and Local Binary Patterns Histograms (LBPH).
All three methods perform the recognition by comparing the face to be rec-
ognized with some training set of known faces. In the training set, we supply
the algorithm faces and tell it to which person they belong. When the al-
gorithm is asked to recognize some unknown face, it uses the training set to
make the recognition. Each of the three aforementioned methods uses the
training set a bit differently.
Eigenfaces and Fisherfaces find a mathematical description of the most dom-
inant features of the training set as a whole. LBPH analyzes each face in the
training set separately and independently. The Fisherfaces method learns a
class-specific transformation matrix, so the they do not capture illumination
as obviously as the Eigenfaces method. The Discriminant Analysis instead
finds the facial features to discriminate between the persons. Its important
to mention, that the performance of the Fisherfaces heavily depends on the
input data as well. Practically said: if you learn the Fisherfaces for well-
illuminated pictures only and you try to recognize faces in bad-illuminated
scenes, then method is likely to find the wrong components (just because
those features may not be predominant on bad illuminated images). This is
somewhat logical, since the method had no chance to learn the illumination.
1.4 Thesis Outline
This thesis is divided into 5 chapters , Chapter 1 is introduction and explains
about the algorithms used for face detection and recognition. howw does it
fit into the MAVI system. Chapter 2 gives details about th design space
exploration done and parameters involved in the experimentation. Chapter
3 is about the embedded implementation of the modelled algorithms on the
Zedboard while chapter 4 is about the software - hardware partitioning of
hotspot detected using profiling of the algorithm. Finally chapter 5 concludes
the thesis and discusses about the future work on the module.
Chapter 2
Modelling and Implementation
of Face Detection
2.1 OpenCV
2.1.1 Introduction
OpenCV (Open Source Computer Vision Library) is an open source computer
vision and machine learning software library. OpenCV was built to provide
a common infrastructure for computer vision applications and to accelerate
the use of machine perception in the commercial products.The library has
more than 2500 optimized algorithms, it has C++, C, Python, Java and
MATLAB interfaces and supports Windows, Linux, Android and Mac OS.
OpenCV leans mostly towards real-time vision applications and is written
natively in C++ and has a templated interface that works seamlessly with
STL containers.We used current version of OpenCV 3.1.0 for our application
development and implementation.
2.1.2 APIs used in Face Detection and Recognition
The ObjectDetect module as a library was used by programs implementing
Face Detection. The class used is CascadeClassifier - for loading and using
the Haar Cascade classifier.Following APIs were used in program.
• CascadeClassifier::load-This API is used for loading the saved XML
containing the HAAR features for already trained faces.The features
are loaded into vectors which are later used in detection. The XML
also provide information regarding stage threshold and number of trees
in each stage.Each stage of cascade classifier consist of few number of
c© 2016, Indian Institute of Technology Delhi
features stored in the form of tree having two internal nodes and three
leaf nodes. Each tree also has its own feature threshold computed by
sum of the leaves and all features combined to give the stage threshold.
• CascadeClassifier::detectMultiScale - This API is used for detec-
tion of faces in a frame taking scale factor, minNeighbours, minimum
size of face to be detected as parameters.These parameters often effect
the performance.The function involves scaling of image for different
window size , computing integral image, setting of image for feature
evaluation and removing the redundant faces detected. The feature
window is passed over whole image for about 1047 features in a tree
based staged cascade of 20 stages.
The parameters involved in the API are :
– Minimum Window Size: the parameter Size(W,H), defines the
size of smallest face to be searched within the input image. Ac-
tually this is the size of initial sliding-window. The default size
in OpenCV is w=24 and h=24. However based on input image a
tiny sub-window of 24x24 may not be meaningful as a face. So
you may increase the size of initial search windows e.g. to 100x100
.
– Scale Factor :The input parameter specifies how quickly OpenCV
should increase the scale for face detections, within search itera-
tions. Higher values for this parameter will cause the detector run
faster (by running fewer number of sliding window in each itera-
tion), but if it’s too high, it may jump too quickly between scales
and miss some of the faces. Default value for this parameter in
OpenCV is 1.1 that means the scale increases by a factor of 10%
on each pass.
– MinNeighbour :When you call the face detector, for each posi-
tive face region actually it may generate many hits from the Haar
detector. However, in face region there would be a large cluster of
rectangles with largely overlaps. In addition, there may be some
scattered detections which may appear around the face region.
Usually, isolated detections are false detections, so it makes sense
to discard these detections. It also makes sense to somehow merge
the multiple detections for each face region into a single detection.
This parameter does both above actions before returning the final
detected face. This merge step at first groups rectangles that con-
tain a large amount of overlap, then finds the average rectangle
for the group. It then replaces all rectangles in the group with the
average rectangle. For example the minimum-neighbors threshold
3 means to merge groups of three or more and discard groups with
fewer rectangles. If you find that your face detector is missing a
lot of faces, you might try lowering this threshold; or increase it,
if you find multiple detections for a single face.
For Face Recognition, the library module ‘face’is used provided in Extra
Modules of OpenCV 3.1.0. The module is not provided along with the pack-
age of OpenCV but is downloaded as extra modules supported by private
members. All face recognition models in OpenCV are derived from the ab-
stract base class ‘FaceRecognizer ’, which provides a unified access to all face
recognition algorithms in OpenCV. every FaceRecognizer supports the:
• Training of a FaceRecognizer with FaceRecognizer::train() on a given
set of images (your face database!).
• Prediction of a given sample image, that means a face. The image is
given as a Mat. Loading/Saving the model state from/to a given XML
or YML.
• Setting/Getting labels info, that is storaged as a string. String labels
info is useful for keeping names of the recognized people.
• FaceRecognizer::train-Trains a FaceRecognizer with given data and
associated labels. Parameters:
– src The training images, that means the faces you want to learn.
The data has to be given as a vector¡Mat¿.
– labels The labels corresponding to the images have to be given
either as a vector¡int¿ or a
• FaceRecognizer::predict- Predicts a label and associated confidence
(e.g. distance) for a given input image. Parameters:
– src Sample image to get a prediction from.
– label The predicted label for the given image. confidence As-
sociated confidence (e.g. distance) for the predicted label.
which given an image of a face will determine if this is the person the
class was trained to recognize. The method returns a Boolean value
according to the result of the recognition, and if there was recognition,
the confidence is stored in the confidence variable.
• FaceRecognizer::save- Saves a FaceRecognizer and its model state.Saves
this model to a given filename, either as XML or YML. Parameters:
– filename The filename to store this FaceRecognizer to (either
XML/YAML).
• FaceRecognizer::load - Loads a persisted model and state from a
given XML or YAML file . Every FaceRecognizer has to overwrite Fac-
eRecognizer::load(FileStorage& fs) to enable loading the model state.
FaceRecognizer::load(FileStorage& fs) in turn gets called by FaceRec-
ognizer::load(const string& filename), to ease saving a model.
• createFisherFaceRecognizer-This is a function used to create the
object of FaceRecognizer class. Parameters:
– num components The number of components (Fisherfaces) kept
for this Linear Discriminant Analysis with the Fisherfaces crite-
rion. Its useful to keep all components, that means the number
of your classes c (subjects, persons you want to recognize). If left
this at default (0) or set it to a value less-equal 0 or greater (c-1),
it will be set to the correct number (c-1) automatically.
– threshold The threshold applied in the prediction. If the dis-
tance to the nearest neighbor is larger than the threshold, this
method returns -1.
2.2 Parameters tuning
Face Detection API provided by OpenCV accept various parameters such
as scale factor ,min Neighbour,database size for face recognition. These pa-
rameters were tuned and optimized for real time face detection. Some of
the results are shown in next chapter. computation time for different scale
factors were measured and reported for execution on ZedBoard. The use of
equalization of histogram is avoided as it was observed to reduce the accuracy
of face detection and recognition system for outdoor scenarios.
2.3 Scale Factor & MinNeighbour
To achieve the required accuracy and frame processed per second we played
with scale factor and as expected larger factor reduce both computation time
and accuracy.The scale factor was varied from 1.05 to 1.4 over the test data
of 65 images containing 0,1 or 2 faces.
Min Neighbour allows for removal of false positives from the detection results
but there could be some true results that could be removed as detection
windows do not have the required number of neighbouring detections.
2.4 Minimum Face Size
Experimentation with minimum face window were also done for extracting
the relationship between the face size and the distance.It was intuitive and
observed that as the face become more and more distant from the camera
the face size captured in the frame become smaller. As the trained Cascade
Classifier provided by the OpenCV has minimum face window at 20X20 we
cannot detect faces below that dimension. From our experiments we con-
cluded that this corresponds to distance of 20-22 feet or little more than 6
meters.
For detection of smaller faces or very low resolution faces, we tried searching
for classifiers trained for smaller faces but unable to find any and as we were
not planning to train our own classifier so we opted for scaling image and
then doing detection. This has affect on two fronts one is computation time
and other was false positives though we were able to detect smaller faces of
size up to 7X7 pixels on scaling the image size up to four times which corre-
sponds to the face distance of about 40 feet or 12 meters. Final results are
provided in next chapters. As mentioned both computation time and num-
ber of false positives reported rose considerably.These false positives could
be suppressed by using higher number for minNeighbour parameter though
it was not experimented. It needs to be noted that for smaller faces we need
to scale up the face detected for recognition to the trained database size of
92X112 which leads to very bad results in recognition. It was experimentally
concluded that faces nearer than or at 10 feet or 3 meters gave recognition
rate of 65 percent, others(more distant than 3 meters) give less than 50 %
accuracy.
2.5 FR Database Size
Database size for Face Recognition training was also experimented with sizes
of 5 images/face, 8 images/face and 10 images/face, with total of 10 faces
in whole database. The test faces were taken from detection results giving
25 faces of 2 known faces and 1 unknown face with respect to trained face
recognition system.Highest accuracy of 65% was achieved with 8 images/face.
Also slight increase in the recognition time was observed with increasing
images / faces.
Database Size Accuracy5 images/face 55%8 images/face 65%10 images/face 63%
Table 2.1: Accuracy vs Database Size
2.6 Upper Body and Face Detection
To reduce the computation time taken by face detection ,it was clubbed with
Upper Body detection. The areas detected as Upper Body are cascaded to
Face Detection which run only inside the areas detected as the Upper Body.
The benefit comes from that the minimum window required for Upper Body
for the minimum face of 20X20 is 55X55 which is computed experimentally
reduces the computation considerably as now the detection is happening at
much larger window hence requiring less number of iterations over a image
. The detected ROI for upper body of minimum size 55X55 then under go
face detection on upper two-third of the image as face cannot be below that
where maximum face window is limited by the size of upper body detected.
The results are shown in next chapter. The reduction of about 3 times is
observed in computation time for the final face detection.
There are some drawbacks also associated to this technique,
• The number of false positives in case of upper body are high , though
it can be of no affect as face might not be detected in many cases.
• some images having faces in the corners especially in the bottom part
of the frame where upper body is not there will not be detected.
• if the upper body is distorted in the image especially if a person body
is oblique to a camera then upper body will not be detected.
• face might not be properly visible in case of profile faces or person
facing a back though upper body is detected.
• Images having group of people standing together can have bad results
for Upper body as the distinction between two upper bodies might
cease to exist.
13
Figure 2.1: Face Detection + Upper Body
2.7 Creation FR Database and Training
Creation of training database for face recognition is the most time consuming
activity in whole project . The training database include 10 subjects with
their faces cropped manually from there well illuminated photographs and
outdoor photographs. While cropping, the nose is taken as a centre point
for all faces and area under hair , ears and tip of the chin are removed from
the faces. Also two images per subject having slight rotation of less than 5
degrees is included to make the training invariant of slight distortion from the
ideal facing condition. Few images with some shadow from the sunlight are
also included to make it more agnostic to sunlight and outdoor conditions.
The shadow part done with both left face having some shadow and right
part having some shadow. The emotions are also varied across the images
including smiling , sad, laughing,straight,serious etc.Face Recognition Database
Figure 2.2: Face Recognition Database
Chapter 3
Embedded Face Detection and
Recognition
3.1 ZedBoard-Choice for prototyping
ZedBoard is based on Zynq All-Programmable SoC (AP SoC). This product
integrates a feature-rich dual-core ARMCortex- A9 MPCore based processing
system (PS) and Xilinx programmable logic (PL) in a single device, built on
a state-of- the-art, high-performance, low- power process technology .It is a
joint venture by Xilinx (Zynq AP SoC), Digilent (board manufacturer) and
Avnet (distributor) .
• Xilinx Zynq XC7Z020-1CSG484 EPP (extensible processor platform)
containing Dual-core ARM Cortex-A9 (PS) running at 666 MHz and
Artix-7 FPGA (PL)
• Memory: 512 MB DDR3 and 256 Mb QSPI Flash
• On-board oscillators: 33.333 MHz (PS), 100 MHz (PL)
• Interfaces like USB-JTAG Programming, Ethernet, USB OTG, USB-
UART,SD Card, Pmod, LEDs, Dip switches, Push buttons etc.
• Display: HDMI output, VGA (12-bit colour)
• Power: 12 V @ 5A AC-DC regulator
3.2 Linaro
For the software-only implementation, Linaro Ubuntu distribution was used.
Linaro is a open-source complete Linux distribution based on Ubuntu. It
c© 2016, Indian Institute of Technology Delhi
supports graphical desktop through on-board HDMI port. For booting from
SD card, Linaro files system has to be placed in different partition other
than the partition where Kernel image,devicetree reside. It is a persistent
OS, i.e. all changes are written to memory and it saves files after reboot or
shut down. OpenCV(version 3.1) was built on top of it. For booting Linux,
8 GB, class-10 SD card was used. As here Linaro file system was used, so
SD card must be partitioned properly into two section:
• First partition should have FAT file system and it will contain boot
image, kernel image and device tree file .
• Second partition should have ext4 file system and it will contain Linaro
distribution.
ZedBoard boots over a number of stages .
• Stage-0: After a power-on reset (POR), the hardware samples the boot
strap pins to determine the Boot mode: JTAG mode, NAND/NOR
flash mode, SD card mode etc. and optionally enables PS PLLs. Then
hard-coded BootROM is executed in primary CPU-0. BootROM con-
figures the PS to access the boot device. It validates and reads boot-
header to determine the boot flow and then load the FSBL in OCM .
• Stage-1: In stage-1 FSBL loaded and executed from OCM. It is re-
sponsible for several initialization functions including CPU initializa-
tion with PS7 Init configuration data, programming the PL using Bit-
stream (if available), loading SSBL into DDR memory, handoff control
to SSBL execution.
• Stage-2: For Linux booting, It is SSBL, such as U-Boot, open course
universal bootloader for Zynq. It is responsible for loading the Linux
kernel image, device-tree file and Linux file system. It also initializes
hardware that is not done by kernel like serial port, DDR memory.
3.3 Energy and power measurement on Zed-
Board
A 10 milli ohm, 1 Watt current sense resistor is in series with the 12V input
power supply. To measure the voltage across the resistor, one jumper (J21)
is connected with it. Agilent 34410A high-performance sampling multimeter
was used to measure the voltage. This multimeter was directly interfaced to
PC via USB port. From the PC itself it is possible to start/stop the voltage
measurement, see the waveform and also to export the results in either CSV
file format or in excel sheet for further calculations.ZedBoard was powered
by 12V adapter. The current value drawn from the power source can be
calculated by dividing the voltage value by that of the resistance using Ohms
law.
Figure 3.1: Power measurement setup
For measuring software power taken by the application code we run a dummy
Cases Power(W)base 3.528
Dummy on 1 core .096dummy on both core 0.192only Face Detection 0.132
FD + dummy 0.156FD + TD 0.372
Table 3.1: Power Measurement
code and pinned it to core 0 of the dual core Arm cortex on the board which
keep it busy all the time . Our application was pinned to another core , the
incremental jump observed in the voltage across the J21 pin was measured
giving peak value and the average value. The base power with linux b running
a dummy loop on one core is 3.65W while running our application of face
detection gives 3.822W , The incremental power is then computed to 156mW.
3.4 Cross Compilation for ZedBoard
Arm toolchain provided for linux arm-linux-gnueabihf was installed and OpenCV
3.1 was source compiled using this tool chain for Linaro based systems.
OpenCV was compiled both as a shared library and static libraries , stripped
version of OpenCV was also compiled . The same tool chain is use to com-
pile the application for Zedboard. The document for cross compilation is
included in Appendix.
3.5 Results
From the Fig.3.2 we can conclude that Accuracy decreases as scale factor
increases. The results were collected from 65 outdoor images consisting of
no faces, one face or more than one faces. The detection rate was normalized
assuming 1.05 gives maximum detection.
Fig.3.3 plots the number of false positives w.r.t scale factor and as scale factor
increases false positives tend to decrease as number of windows processed
decrease.But there is unexplained anomaly at scale factor 1.25. Fig.3.4 shows
as scale factor increases computation time decreases. To achieve less than 1s
of computation time and low false positives we selected scale factor for 1.2,
as we can observe here the computation time for frames in which no faces
are detected forms the lower bound and for those faces were detected form
the upper bound. The CDF from 3.5 to 3.12 are provide useful information
on embedded platforms to help in resource allocation and scheduling and
when to expect the computation of most of the frames to be completed.Fig
3.13 shows the energy consumption for the whole board with Face Detection
running
1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.450
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Face Detection Accuracy
True Faces Detected
Scale Factor
Nor
mal
ized
Acc
urac
y w
.r.t.
SF
-1.0
5
Figure 3.2: Face Detection Accuracy
1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.450
5
10
15
20
25
30
Face Detection Accuracy
False Positives
Scale Factors
# F
als
e P
osi
tive
s
Figure 3.3: Face Detection-False Positives
1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.40
250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
Face Detection
Average time taken(ms)
Both With Face Without Face
Scale Factor
Tim
e (
ms)
Figure 3.4: Face Detection-Average time taken
1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 40000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Face Detection computation time
Scale Factor - 1.05
With Faces Without Faces
Time(ms)
% o
f fra
mes
pro
cess
ed
90% 95% 99%
Figure 3.5: Face Detection-Computation Time(SF-1.05)
800 1000 1200 1400 1600 1800 2000 2200 2400 26000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Face Detection Computation Time
Scale Factor - 1.1
With Faces Without Faces
Time (ms)
% o
f fra
me
s p
roce
sse
d
90% 95%
99%
Figure 3.6: Face Detection-Computation Time(SF-1.1)
600 800 1000 1200 1400 1600 1800 2000 2200 2400 26000
0.2
0.4
0.6
0.8
1
1.2
Face Detection Computation Time
Scale Factor - 1.15
With Faces Without Faces
Time (ms)
% fr
am
es
pro
cess
ed
90% 95%
99%
Figure 3.7: Face Detection-Computation Time(SF-1.15)
200 400 600 800 1000 1200 1400 1600 18000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Face Detection Computation Time
Scale Factor - 1.2
With Faces Mean (With Faces) Without Faces
Time (ms)
% fr
ame
s p
roce
sse
d
90% 95% 99%
Figure 3.8: Face Detection-Computation Time(SF-1.2)
200 300 400 500 600 700 800 900 1000 1100 12000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Face Detection Computation Time
Scale Factor - 1.25
With Faces Without Faces
Time (ms)
% fra
me
s p
roce
sse
d
90% 95% 99%
Figure 3.9: Face Detection-Computation Time(SF-1.25)
200 300 400 500 600 700 800 9000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Face Detection Computation time
Scale Factor - 1.3
With Faces Without Faces
Time (ms)
% fr
am
e p
roce
sse
d
90% 95% 99%
Figure 3.10: Face Detection-Computation Time(SF-1.3)
200 300 400 500 600 700 8000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Face Detection Computation Time
Scale Factor - 1.35
With Faces Without Faces
Time (ms)
% fra
me
s p
roce
sse
d
90% 95%
99%
Figure 3.11: Face Detection-Computation Time(SF-1.35)
100 200 300 400 500 600 7000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Face Detection Computation Time
Scale Factor - 1.4
With Faces Without Faces
Time (ms)
% fra
me
pro
ce
sse
d
90% 95% 99%
Figure 3.12: Face Detection-Computation Time(SF-1.4)
1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.40
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
10.5
11
Face Detection Energy Consumption
vs Scale Factor
Energy
Scale Factor
En
erg
y C
on
su
me
d (
J)
Figure 3.13: Face Detection-Board EnergyFD Energy Consumption
Figure 3.14: Face Detection-Component Energy
5 10 15 20 25 30 35 40 45 500
20
40
60
80
100
120
140
160
Detection of Low Resolution faces
Effect of scaling on distance of face detected
Size of faces detected(1x) Size of faces detected(2x) Size of faces detected(4x)
Distance (ft)
Siz
e of
face
s (p
ixel
s)
Figure 3.15: Face Detection-distance
1 2 40
5
10
15
20
25
False Positives vs Scaling
Effect of scaling
Column G
Scaling
# F
als
e P
osi
tive
s
Figure 3.16: Face Detection-upscaling
Figure 3.17: Face size variation with distance
Chapter 4
Hardware Software Codesign
4.1 Tools
4.1.1 SDSOC
The SDSoC (software defined system-on-chip) development environment pro-
vides an embedded C/C++ application development experience for Zynq All
Programmable SoC. It has complete industry first C/C++ full system op-
timizing compiler, System level Profiler, automated software acceleration in
programmable logic and automated system connectivity generation. It sup-
ports bare metal, Linux and FreeRTOS as target OS.
The SDSoC system compilers analyze a program to determine the data flow
between software and hardware functions, and generate an application spe-
cific system on chip. We can achieve high performance by configuring each
hardware function runs as an independent thread. The system compilers
ensures synchronization between hardware and software threads and enables
pipelined computation. The SDSoC system compilers can invoke the Vivado
HLS tool to compile the synthesizable C/C++ functions into programmable
logic. The tool generate a complete hardware system which includes DMAs,
interconnects, hardware buffer and other IPs, and it also configures the
FPGA by invoking the Vivado tools. SDSoC (Software Defined System-
on-Chip) is a C/C++ environment for complete embedded system design on
heterogeneous platform of Zynq using hardware/software partitioning. SD-
SoC performs program analysis, task scheduling and binding onto processor
and configurable logic for accelerator, specified by user. SDSoC compiler and
linker generates code for hardware and software that automatically orches-
trates communication and cooperation among hardware and software compo-
nents . User can play around with different accelerators by just toggling the
target of that function, i.e. hardware or software. It invokes Vivado HLS for
c© 2016, Indian Institute of Technology Delhi
accelerator and interface synthesis, then makes connections in Vivado Design
Suite and generates the bit file. Thereafter it generates object code for pro-
cessor using GNU tool chain. It also generates the boot files: kernel image,
device tree, boot-image etc. to run the application from ramdisk. User can
optimize the hardware by using pragmas and he/she can also decide a data
mover type and PS-port which will be interfaced to accelerator. The some
important points are:
• User will profile the code and will define the hotspot function as HW
target. More than one accelerator is also possible here.
• The hardware is synthesized using Vivado HLS and so HLS guidelines
should be followed. HLS pragmas or compiler directives can be used to
optimize the hardware.
• The code is compiled by sdscc for C-code, sds++ for C++ code.
• The SDSoC linker creates SD card image to implement the application
in Linux environment.
• SDSoC also gives provision to decide specific data-movers: AXI DMA
in simple mode, AXI DMA scatter-gather mode etc and PS-PL inter-
face ports: ACP, HP, GP etc by using pragmas.
• In Linux, memory allocation is done in virtual space always. It is dis-
tributed across multiple pages in physical space. DMA or any hardware
operates on physical address only. So for each memory allocation, the
elements must be mapped to physical space. Scatter-gather DMA can
handle such list of pages, whereas simple DMA can handle only single
page.
• SDSoC provides mechanism to allocate contiguous memory in physical
address space using sds alloc and sds free. Basically Linux kernel also
has support for CMA (contiguous memory allocator).
• So in Linux, memory allocated by malloc can be taken care of by
scatter-gather DMA, where as simple DMA handles memory allocated
by sds alloc.
The advantage of SDSoC tool is its automatic generation of interconnects sys-
tem in accordance to the user specified pragmas and also its device drivers for
a linux based system without worrying about to write one as in case of linaro
based systems. SDSOC also provides support for OpenCV 3.0 and many of
the libraries are provided which can be used for software only version of the
application using OpenCV. Support for ml and face module of OpenCV is
not complete due to which face recognition program cannot be built in it.
However, we can build these libraries separately using the SDSOC tool chain
arm-xilinx-linux-gnueabi-g++ . This can be done in two ways, one is using
SDSoC SDK itself as explained in the user guide provided by the tool and
other is using the tool chain directly and cross compiling manually similar to
cross compiling for linaro based solutions. We tried using the second method
and were able to create required static libraries but while executing the pro-
grams using those libraries nothing came as a output or an error.
SDSoC comes with two files for each platform ,one describing software con-
figuration and other hardware. Whenever a new platform is generated , these
2 file are also generated and contains the path to the required libraries in
case of software. Whenever we have to add new libraries to the platform we
need to modify this file to reflect the path to the library. These platform files
have an extension of .pfm.
SDSoC also contains Vivado HLS libraries for its hardware synthesis part and
thus have access to OpenCV HLS libraries and functions provided by Xilinx
. These functions can be used in functions that will be marked for hardware
acceleration and are completely synthesizable. Though transforming and
transferring data structures in software OpenCV to synthesizable OpenCV
functions and vice versa requires interface functions which are provided by
Xilinx. But the problem with this is that the software OpenCV version in
SDSoC has version 3.0 which has deprecated many of the data structures
used by 2.3 version of HLS OpenCV. We developed our own interface APIs
for conversion from software version to hardware version. These API were
quite raw and take much time during conversion hence defeating the purpose.
So it was decided to flatten the OpenCV Face Detection code and remove all
the dependencies from the libraries of OpenCV. All templates were removed
,complex classes were replaced by arrays and corresponding structures. The
flattened code was then converted into more synthesizable form which could
be partitioned in to hardware for acceleration.
While building the projects using SDSoC SDK all the building steps are
logged into the file. We can run these commands manually also from the
terminal. We can use this hack to know the time taken in data transfer (
sending and receiving) using the DMA generated by the tool and opt for
the most optimized one by inserting our own statements in the intermediate
programs which include statements to invoke DMA functions such as cf send
and cf receive.
4.1.2 VIVADO HLS
Vivado HLS also comes with Vivado IDE as a component tool. Vivado HLS
takes the behavioural description written in C/C++/SystemC along with
some constraints, synthesizes it and produces an RTL description of the
same. Basically it is converting a behavioural descriptions into a timed cycle
accurate RTL descriptions. The input output entities involved in HLS pro-
cess are shown in figure 6.1. With the help Vivado HLS, the we can specify
the data type (integer, fixed-point or floating-point), abstraction of algorith-
mic description and interfaces (FIFO,AXI4,AXI4-Lite,AXI4-Stream). It is a
directives driven architecture-aware synthesis which gives the best quality of
Results. The accelerated designs can be verified using C/C++ test bench
simulation, automatic Verilog or VHDL simulation and test bench genera-
tion. The Vivado HLS also exports synthesized RTL designs as IP cores
by adding a desired bus interface which can be added to a system using IP
integrator.
4.2 Profiling & Code Modification
Profiling is a key technology to ensure optimal match between target hard-
ware and software by 0 the software efficiency. As the main focus is on
accelerator synthesis, so the profiling granularity, used here is coarse. Oth-
erwise, for fine granularity the bottleneck would be the communication be-
tween processor and hardware. From profiling, two critical information are
presented:occurrence of function and time spent in each function. After pro-
filing we will get the candidates for hardware acceleration. The profiler used
here is Gprof and Perf tool. Both codes ,one using OpenCV libraries and
other flattened code were profiled.
Face detection
xml load
image load
others
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
96.41
1.81
1.1
0.68
Profile Results
Face Detection
profiling
% of time
Fun
ctio
ns
Figure 4.1: Face Detection Accuracy
predictOrdered
setWindow
integral
detectSingleScale
setImage
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
92.96
1.93
0.69
0.55
0.28
Profiling - Face Detection
profiling
% time
Fun
ctio
ns
Figure 4.2: Face Detection Accuracy
The analysis of code shows that PredictOrdered function in which maximum
time is spent is nested into 3 loops , one loop to increase the window size
and rest for sliding the window over whole image. inside this kernel the
computation of threshold values is taking place for 1047 features divided
into 20 stages. The function can exit from any of the 20 stages ,but each
feature consist of a tree having 3 leaves and 2 internal nodes .
4.3 Hardware Acceleration
We will be using SDSOC for hardware acceleration . Partitioning will be
done in accordance to the profiling results. The kernel consist of 4 nested
loops . The outer 2 loops are fixed and inner most loop will run for 3 times
hence can be unrolled. The third loop is not fixed and can exit at any stage.
It can be implemented as a IP which is called each time for each stage.
The outer most 2 loops can be parallized to work on different part of image
concurrently. This can steps for acceleration need to carried out in future.
Chapter 5
Conclusion & Future Work
5.1 Conclusion
The design space exploration was performed for embedded implementation
of Face Detection and Recognition on Zed board . Relationship with scale
factors , computation time , accuracy and energy consumption is explored.
As scale factor is increased accuracy,computation time, energy consumption
decreases. The CDF were extracted for different scale factor to assist the
future controller to make decision for scheduling and time allocation of this
application. The Face detection can be clubbed with Upper Body detection
to reduce the computation time. The scale factor of 1.2 was chasen for Face
Detection and Database Size of 8 images / face for Face Recognition. Also
predictOrdered is the function which can be accelerated using hardware.
5.2 Future Work
The effect of variation in scale factor of both Face detection and upper body
detection needs to done. Hardware acceleration using SDSOC will be a major
work that need to be carried out . Creation of face recognition database for
different image dimensions.
c© 2016, Indian Institute of Technology Delhi
Bibliography
[1] OpenCV 3.1.0. Face detection using haar cascades, 2015. OpenCV 3.1.0.
[2] Yi-Qing Wang. An analysis of the viola-jones face detection algorithm,
2014. Image Processing Online.
c© 2016, Indian Institute of Technology Delhi