Modelling and Implementation of Face Detection and ... · Modelling and Implementation of Face...

Modelling and Implementation of FaceDetection and Recognition for MobilityAssistant for Visually Impaired System

(MAVI)

A thesis submitted in partial fulfillmentof the requirements for the degree of

MASTER OF TECHNOLOGY

in

VLSI Design Tools & Technology

by

MUNIB FAZALEntry No. 2014JVL2694

Under the guidance of

PROF. M. BALAKRISHNANDr. CHETAN ARORA(IIIT Delhi)

Department of VLSI Design Tools & Technology ,Indian Institute of Technology Delhi.

June 2016.

Certificate

This is to certify that the thesis titled Modelling and Implementation of

Face Detection and Recognition for Mobility Assistant for Visually

Impaired System (MAVI) being submitted by MUNIB FAZAL for the

award of Master of Technology in VLSI Design Tools & Technology

is a record of bona fide work carried out by him under my guidance and

supervision at the Department of Computer Science & Engineering .

The work presented in this thesis has not been submitted elsewhere either in

part or full, for the award of any other degree or diploma.

PROF. M. BALAKRISHNAN

Department of Computer Science and Engineering

Indian Institute of Technology, Delhi

Dr. CHETAN ARORA

Department of Computer Science and Engineering

Indraprastha Institute of Information Technology, Delhi

“Dedicated to memory of my father.”

Abstract

The thesis presents the design space exploration, implementation of OpenCV

Face Detection along with Face Recognition for MAVI (Mobility Assistant

for visually Impaired), an outdoor navigation system for helping visually im-

paired (acronym VI will be used) person to socialize , being aware of people

around him and recognize them.

The work involves measuring accuracy, performance and power/energy on

Zedboard using its arm cores and configuration of various parameters to

OpenCV face detection functions, cross compilation and building of environ-

ment for Zedboard arm development, profiling algorithm and exploring tools

and technique for hardware acceleration of the algorithm. The exploration

had been done mindful of other processes and application that will be run-

ning alongside with this application and provide useful parameters to a future

controller for controlling performance and energy consumption according to

the circumstances and needs of a VI person.The use of Upper body Detection

is also included for reducing the computation time and complete flow to face

recognition is done and presented.

The embedded applications can be better implemented using Xilinx Zynq All

Programmable System-on-Chip device as it has ARM processor core along

with the programmable FPGA fabric on the same chip. This approach facil-

itates the acceleration based implementations which can further optimized

for performance.The future work involves hardware acceleration of proposed

functions using Xilinx SDSOC and integration with other modules being

built for MAVI.

Acknowledgments

I take this opportunity to thank my supervisor Prof. M. Balakrishnan for his

constant supervision and valuable guidance during the course of thesis. I am

highly indebted to him for believing in me and for being a constant source

of motivation.

I would also like to thank our co-supervisor Dr. Chetan Arora for his valuable

insights and assistance regarding subject of Computer Vision, Prof. Anshul

Kumar for all his help and support as the program coordinator of VDTT,

IIT Delhi, Mr. Rajesh Kedia and Mrs. Radhika for having useful discussions

and providing us with ideas, Mr. Sourajit Jash for providing initial help and

orientation on the thesis work, Mr. Sharma and Mr. Rakesh for providing

me with all the lab equipments and support.

My heartfelt thanks to Yoosuf KK and Hassen Basha for making my thesis

journey fun filled and memorable,Siva Krishna Aleti and Akshay Jain for

their kind gesture and making lab environment joyful, Saurabh Agrawal for

assisting me in Documentation and result evaluations.

Last but not the least I would like to thank GOD for being merciful and my

parents for their constant prayers and well wishes for me.

MUNIB FAZAL

Contents

1 Introduction 1

1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Mobility Assistant for Visually Impaired (MAVI) . . . . . . . 1

1.3 Face Detection and Recognition . . . . . . . . . . . . . . . . . 3

1.3.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . 3

1.3.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.3 Integral Image . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.4 Cascade of Classifiers . . . . . . . . . . . . . . . . . . . 6

1.3.5 Face Recognition . . . . . . . . . . . . . . . . . . . . . 7

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Modelling and Implementation of Face Detection 9

2.1 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 APIs used in Face Detection and Recognition . . . . . 9

2.2 Parameters tuning . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Scale Factor & MinNeighbour . . . . . . . . . . . . . . . . . . 13

2.4 Minimum Face Size . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 FR Database Size . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Upper Body and Face Detection . . . . . . . . . . . . . . . . . 15

2.7 Creation FR Database and Training . . . . . . . . . . . . . . . 16

3 Embedded Face Detection and Recognition 18

3.1 ZedBoard-Choice for prototyping . . . . . . . . . . . . . . . . 18

3.2 Linaro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Energy and power measurement on ZedBoard . . . . . . . . . 20

c© 2016, Indian Institute of Technology Delhi

3.4 Cross Compilation for ZedBoard . . . . . . . . . . . . . . . . . 21

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Hardware Software Codesign 31

4.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 SDSOC . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.2 VIVADO HLS . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Profiling & Code Modification . . . . . . . . . . . . . . . . . . 34

4.3 Hardware Acceleration . . . . . . . . . . . . . . . . . . . . . . 36

5 Conclusion & Future Work 37

5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Bibliography 38

List of Figures

1.1 MAVI System . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 MAVI System and its modules . . . . . . . . . . . . . . . . . . 2

1.3 Face Detection and Recognition Module . . . . . . . . . . . . 3

1.4 Haar Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Original image . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Integral Image or Sum Area Table . . . . . . . . . . . . . . . . 6

2.1 Face Detection + Upper Body . . . . . . . . . . . . . . . . . . 16

2.2 Face Recognition Database . . . . . . . . . . . . . . . . . . . . 17

3.1 Power measurement setup . . . . . . . . . . . . . . . . . . . . 20

3.2 Face Detection Accuracy . . . . . . . . . . . . . . . . . . . . . 22

3.3 Face Detection-False Positives . . . . . . . . . . . . . . . . . . 23

3.4 Face Detection-Average time taken . . . . . . . . . . . . . . . 23

3.5 Face Detection-Computation Time(SF-1.05) . . . . . . . . . . 24

3.6 Face Detection-Computation Time(SF-1.1) . . . . . . . . . . . 24







3.13 Face Detection-Board Energy . . . . . . . . . . . . . . . . . . 28

3.14 Face Detection-Component Energy . . . . . . . . . . . . . . . 28

3.15 Face Detection-distance . . . . . . . . . . . . . . . . . . . . . 29

3.16 Face Detection-upscaling . . . . . . . . . . . . . . . . . . . . . 29


3.17 Face size variation with distance . . . . . . . . . . . . . . . . . 30



List of Tables

2.1 Accuracy vs Database Size . . . . . . . . . . . . . . . . . . . . 14

3.1 Power Measurement . . . . . . . . . . . . . . . . . . . . . . . 21

Chapter 1

Introduction

1.1 Problem Definition

Real-time video processing is increasingly becoming an important applica-

tion. An interesting task that is critical in not only security and surveillance

but also in many other applications is the problem of face detection and

recognition. We are particularly interested in developing an assisting device

for visually impaired that can help him/her in detection and recognizing

familiar faces while walking.The objective is to implement a real-time, on-

line, energy-optimal face detection and recognition algorithm by hardware-

software co-design.

1.2 Mobility Assistant for Visually Impaired

(MAVI)

The objective of MAVI System is to conceptualize and design a smart-camera

based vision system, capable of extracting useful information, e.g.faces, tex-

ture, signboard information from captured images. It consist of different

modules integrated onto one platform,communicating to user through mobile

interface and using cloud to translate the coordinates sent by Localization

module into navigational information that can be annotated to signboard de-

tected in a frame captured by the camera. The modules in MAVI consist of

Face Detection and Recognition, Texture Recognition, Signboard Detection

and Localization.


Figure 1.1: MAVI System

Figure 1.2: MAVI System and its modules

1.3 Face Detection and Recognition

This module consist of two sub modules of Face Detection and Face Recog-

nition.The frame capture from the camera is initially sent to Face Detection

System which will detect the faces in a frame,then crop them and resize it

according to the requirements of Face Recognition System.The FR system

will then recognize the face to match in its database of faces.

Face Detection and Recognition System

6

VGA Camera

Face Detection Module

Face Recognition

Module

FR training application

FR predict application

Fisher face recognizer

from OpenCV

Cropped detected

faces

FD Application

Haar Cascade Classifier of

OpenCV

frames

Face Database

YML file

Figure 1.3: Face Detection and Recognition Module

1.3.1 Face Detection

Object Detection using Haar feature-based cascade classifiers is an effective

object detection method proposed by Paul Viola and Michael Jones in their

paper, ”Rapid Object Detection using a Boosted Cascade of Simple Fea-

tures” in 2001. It is a machine learning based approach where a cascade

function is trained from a lot of positive and negative images. It is then

used to detect objects in other images.Initially, the algorithm needs a lot of

positive images (images of faces) and negative images (images without faces)

to train the classifier. Then we need to extract features from it. For this,

haar features shown in below image are used. They are just like our convo-

lution kernel. [1].For this project we had used already trained OpenCV Haar

Cascade Classifier consisting of 20 stages,1047 features.

1.3.2 Features

[2] The Viola-Jones algorithm uses Haar-like features, that is, a scalar prod-

uct between the image and some Haar-like templates.Each feature is a single

value obtained by subtracting sum of pixels under white rectangle from sum

of pixels under black rectangle.

let I and P denote an image and a pattern, both of the same size NXN.The

feature associated with pattern P of image I is defined by :∑16i6N

∑16j6N

I(i, j)1P (i, j) is white−∑

16i6N

∑16j6N

I(i, j)1P (i, j) is black

The derived features are assumed to hold all the information needed to char-

acterize a face. Since faces are by and large regular by nature, the use of

Haar-like patterns seems justified.

Figure 1.4: Haar Features

1.3.3 Integral Image

There is another crucial element which lets this set of features take prece-

dence: the integral image which allows to calculate them at a very low com-

putational cost. Instead of summing up all the pixels inside a rectangular

window, we can use integral image. Integral image is obtained from original

image by taking sum of all the pixels above that pixel and sum of all the pix-

els on the left of that pixel. This can be computed in one pass of an image.

Integral image can then be used to compute sum of pixels in a window in

a constant time. Integral Image can be computed using equation , if i(x, y)

represents pixel in image and s(x, y) represents integral image pixel point

then

s(x, y) = i(x, y) + s(x− 1, y) + s(x, y − 1) − s(x− 1, y − 1)

.

Figure 1.5: Original image

Figure 1.6: Integral Image or Sum Area Table

The sum of pixels covered in a window can be computed using formula, if

A,B,C,D represents the 4 corners of rectangular window ,then sum of pixels

in the window is

sum = s(A) + s(D) − s(C) − s(B)

1.3.4 Cascade of Classifiers

In an image, most of the image region is non-face region. So it is a better

idea to have a simple method to check if a window is not a face region. If it

is not, discard it in a single shot. Don’t process it again. Instead focus on

region where there can be a face. This way, we can find more time to check

a possible face region.

For this the concept of Cascade of Classifiers was introduced. Instead of

applying all the features on a window, group the features into different stages

of classifiers and apply one-by-one. (Normally first few stages will contain

very less number of features). If a window fails the first stage, discard it. We

don’t consider remaining features on it. If it passes, apply the second stage

of features and continue the process. The window which passes all stages is

a face region.

1.3.5 Face Recognition

In order to understand the methods for recognizing faces, more advanced

mathematical knowledge is required; namely linear algebra and statistics.

OpenCV provides three methods of face recognition: Eigenfaces, Fisherfaces

and Local Binary Patterns Histograms (LBPH).

All three methods perform the recognition by comparing the face to be rec-

ognized with some training set of known faces. In the training set, we supply

the algorithm faces and tell it to which person they belong. When the al-

gorithm is asked to recognize some unknown face, it uses the training set to

make the recognition. Each of the three aforementioned methods uses the

training set a bit differently.

Eigenfaces and Fisherfaces find a mathematical description of the most dom-

inant features of the training set as a whole. LBPH analyzes each face in the

training set separately and independently. The Fisherfaces method learns a

class-specific transformation matrix, so the they do not capture illumination

as obviously as the Eigenfaces method. The Discriminant Analysis instead

finds the facial features to discriminate between the persons. Its important

to mention, that the performance of the Fisherfaces heavily depends on the

input data as well. Practically said: if you learn the Fisherfaces for well-

illuminated pictures only and you try to recognize faces in bad-illuminated

scenes, then method is likely to find the wrong components (just because

those features may not be predominant on bad illuminated images). This is

somewhat logical, since the method had no chance to learn the illumination.

1.4 Thesis Outline

This thesis is divided into 5 chapters , Chapter 1 is introduction and explains

about the algorithms used for face detection and recognition. howw does it

fit into the MAVI system. Chapter 2 gives details about th design space

exploration done and parameters involved in the experimentation. Chapter

3 is about the embedded implementation of the modelled algorithms on the

Zedboard while chapter 4 is about the software - hardware partitioning of

hotspot detected using profiling of the algorithm. Finally chapter 5 concludes

the thesis and discusses about the future work on the module.

Chapter 2

Modelling and Implementation

of Face Detection

2.1 OpenCV

2.1.1 Introduction

OpenCV (Open Source Computer Vision Library) is an open source computer

vision and machine learning software library. OpenCV was built to provide

a common infrastructure for computer vision applications and to accelerate

the use of machine perception in the commercial products.The library has

more than 2500 optimized algorithms, it has C++, C, Python, Java and

MATLAB interfaces and supports Windows, Linux, Android and Mac OS.

OpenCV leans mostly towards real-time vision applications and is written

natively in C++ and has a templated interface that works seamlessly with

STL containers.We used current version of OpenCV 3.1.0 for our application

development and implementation.

2.1.2 APIs used in Face Detection and Recognition

The ObjectDetect module as a library was used by programs implementing

Face Detection. The class used is CascadeClassifier - for loading and using

the Haar Cascade classifier.Following APIs were used in program.

• CascadeClassifier::load-This API is used for loading the saved XML

containing the HAAR features for already trained faces.The features

are loaded into vectors which are later used in detection. The XML

also provide information regarding stage threshold and number of trees

in each stage.Each stage of cascade classifier consist of few number of


features stored in the form of tree having two internal nodes and three

leaf nodes. Each tree also has its own feature threshold computed by

sum of the leaves and all features combined to give the stage threshold.

• CascadeClassifier::detectMultiScale - This API is used for detec-

tion of faces in a frame taking scale factor, minNeighbours, minimum

size of face to be detected as parameters.These parameters often effect

the performance.The function involves scaling of image for different

window size , computing integral image, setting of image for feature

evaluation and removing the redundant faces detected. The feature

window is passed over whole image for about 1047 features in a tree

based staged cascade of 20 stages.

The parameters involved in the API are :

– Minimum Window Size: the parameter Size(W,H), defines the

size of smallest face to be searched within the input image. Ac-

tually this is the size of initial sliding-window. The default size

in OpenCV is w=24 and h=24. However based on input image a

tiny sub-window of 24x24 may not be meaningful as a face. So

you may increase the size of initial search windows e.g. to 100x100

.

– Scale Factor :The input parameter specifies how quickly OpenCV

should increase the scale for face detections, within search itera-

tions. Higher values for this parameter will cause the detector run

faster (by running fewer number of sliding window in each itera-

tion), but if it’s too high, it may jump too quickly between scales

and miss some of the faces. Default value for this parameter in

OpenCV is 1.1 that means the scale increases by a factor of 10%

on each pass.

– MinNeighbour :When you call the face detector, for each posi-

tive face region actually it may generate many hits from the Haar

detector. However, in face region there would be a large cluster of

rectangles with largely overlaps. In addition, there may be some

scattered detections which may appear around the face region.

Usually, isolated detections are false detections, so it makes sense

to discard these detections. It also makes sense to somehow merge

the multiple detections for each face region into a single detection.

This parameter does both above actions before returning the final

detected face. This merge step at first groups rectangles that con-

tain a large amount of overlap, then finds the average rectangle

for the group. It then replaces all rectangles in the group with the

average rectangle. For example the minimum-neighbors threshold

3 means to merge groups of three or more and discard groups with

fewer rectangles. If you find that your face detector is missing a

lot of faces, you might try lowering this threshold; or increase it,

if you find multiple detections for a single face.

For Face Recognition, the library module ‘face’is used provided in Extra

Modules of OpenCV 3.1.0. The module is not provided along with the pack-

age of OpenCV but is downloaded as extra modules supported by private

members. All face recognition models in OpenCV are derived from the ab-

stract base class ‘FaceRecognizer ’, which provides a unified access to all face

recognition algorithms in OpenCV. every FaceRecognizer supports the:

• Training of a FaceRecognizer with FaceRecognizer::train() on a given

set of images (your face database!).

• Prediction of a given sample image, that means a face. The image is

given as a Mat. Loading/Saving the model state from/to a given XML

or YML.

• Setting/Getting labels info, that is storaged as a string. String labels

info is useful for keeping names of the recognized people.

• FaceRecognizer::train-Trains a FaceRecognizer with given data and

associated labels. Parameters:

– src The training images, that means the faces you want to learn.

The data has to be given as a vector¡Mat¿.

– labels The labels corresponding to the images have to be given

either as a vector¡int¿ or a

• FaceRecognizer::predict- Predicts a label and associated confidence

(e.g. distance) for a given input image. Parameters:

– src Sample image to get a prediction from.

– label The predicted label for the given image. confidence As-

sociated confidence (e.g. distance) for the predicted label.

which given an image of a face will determine if this is the person the

class was trained to recognize. The method returns a Boolean value

according to the result of the recognition, and if there was recognition,

the confidence is stored in the confidence variable.

• FaceRecognizer::save- Saves a FaceRecognizer and its model state.Saves

this model to a given filename, either as XML or YML. Parameters:

– filename The filename to store this FaceRecognizer to (either

XML/YAML).

• FaceRecognizer::load - Loads a persisted model and state from a

given XML or YAML file . Every FaceRecognizer has to overwrite Fac-

eRecognizer::load(FileStorage& fs) to enable loading the model state.

FaceRecognizer::load(FileStorage& fs) in turn gets called by FaceRec-

ognizer::load(const string& filename), to ease saving a model.

• createFisherFaceRecognizer-This is a function used to create the

object of FaceRecognizer class. Parameters:

– num components The number of components (Fisherfaces) kept

for this Linear Discriminant Analysis with the Fisherfaces crite-

rion. Its useful to keep all components, that means the number

of your classes c (subjects, persons you want to recognize). If left

this at default (0) or set it to a value less-equal 0 or greater (c-1),

it will be set to the correct number (c-1) automatically.

– threshold The threshold applied in the prediction. If the dis-

tance to the nearest neighbor is larger than the threshold, this

method returns -1.

2.2 Parameters tuning

Face Detection API provided by OpenCV accept various parameters such

as scale factor ,min Neighbour,database size for face recognition. These pa-

rameters were tuned and optimized for real time face detection. Some of

the results are shown in next chapter. computation time for different scale

factors were measured and reported for execution on ZedBoard. The use of

equalization of histogram is avoided as it was observed to reduce the accuracy

of face detection and recognition system for outdoor scenarios.

2.3 Scale Factor & MinNeighbour

To achieve the required accuracy and frame processed per second we played

with scale factor and as expected larger factor reduce both computation time

and accuracy.The scale factor was varied from 1.05 to 1.4 over the test data

of 65 images containing 0,1 or 2 faces.

Min Neighbour allows for removal of false positives from the detection results

but there could be some true results that could be removed as detection

windows do not have the required number of neighbouring detections.

2.4 Minimum Face Size

Experimentation with minimum face window were also done for extracting

the relationship between the face size and the distance.It was intuitive and

observed that as the face become more and more distant from the camera

the face size captured in the frame become smaller. As the trained Cascade

Classifier provided by the OpenCV has minimum face window at 20X20 we

cannot detect faces below that dimension. From our experiments we con-

cluded that this corresponds to distance of 20-22 feet or little more than 6

meters.

For detection of smaller faces or very low resolution faces, we tried searching

for classifiers trained for smaller faces but unable to find any and as we were

not planning to train our own classifier so we opted for scaling image and

then doing detection. This has affect on two fronts one is computation time

and other was false positives though we were able to detect smaller faces of

size up to 7X7 pixels on scaling the image size up to four times which corre-

sponds to the face distance of about 40 feet or 12 meters. Final results are

provided in next chapters. As mentioned both computation time and num-

ber of false positives reported rose considerably.These false positives could

be suppressed by using higher number for minNeighbour parameter though

it was not experimented. It needs to be noted that for smaller faces we need

to scale up the face detected for recognition to the trained database size of

92X112 which leads to very bad results in recognition. It was experimentally

concluded that faces nearer than or at 10 feet or 3 meters gave recognition

rate of 65 percent, others(more distant than 3 meters) give less than 50 %

accuracy.

2.5 FR Database Size

Database size for Face Recognition training was also experimented with sizes

of 5 images/face, 8 images/face and 10 images/face, with total of 10 faces

in whole database. The test faces were taken from detection results giving

25 faces of 2 known faces and 1 unknown face with respect to trained face

recognition system.Highest accuracy of 65% was achieved with 8 images/face.

Also slight increase in the recognition time was observed with increasing

images / faces.

Database Size Accuracy5 images/face 55%8 images/face 65%10 images/face 63%

Table 2.1: Accuracy vs Database Size

2.6 Upper Body and Face Detection

To reduce the computation time taken by face detection ,it was clubbed with

Upper Body detection. The areas detected as Upper Body are cascaded to

Face Detection which run only inside the areas detected as the Upper Body.

The benefit comes from that the minimum window required for Upper Body

for the minimum face of 20X20 is 55X55 which is computed experimentally

reduces the computation considerably as now the detection is happening at

much larger window hence requiring less number of iterations over a image

. The detected ROI for upper body of minimum size 55X55 then under go

face detection on upper two-third of the image as face cannot be below that

where maximum face window is limited by the size of upper body detected.

The results are shown in next chapter. The reduction of about 3 times is

observed in computation time for the final face detection.

There are some drawbacks also associated to this technique,

• The number of false positives in case of upper body are high , though

it can be of no affect as face might not be detected in many cases.

• some images having faces in the corners especially in the bottom part

of the frame where upper body is not there will not be detected.

• if the upper body is distorted in the image especially if a person body

is oblique to a camera then upper body will not be detected.

• face might not be properly visible in case of profile faces or person

facing a back though upper body is detected.

• Images having group of people standing together can have bad results

for Upper body as the distinction between two upper bodies might

cease to exist.

13

Figure 2.1: Face Detection + Upper Body

2.7 Creation FR Database and Training

Creation of training database for face recognition is the most time consuming

activity in whole project . The training database include 10 subjects with

their faces cropped manually from there well illuminated photographs and

outdoor photographs. While cropping, the nose is taken as a centre point

for all faces and area under hair , ears and tip of the chin are removed from

the faces. Also two images per subject having slight rotation of less than 5

degrees is included to make the training invariant of slight distortion from the

ideal facing condition. Few images with some shadow from the sunlight are

also included to make it more agnostic to sunlight and outdoor conditions.

The shadow part done with both left face having some shadow and right

part having some shadow. The emotions are also varied across the images

including smiling , sad, laughing,straight,serious etc.Face Recognition Database

Figure 2.2: Face Recognition Database

Chapter 3

Embedded Face Detection and

Recognition

3.1 ZedBoard-Choice for prototyping

ZedBoard is based on Zynq All-Programmable SoC (AP SoC). This product

integrates a feature-rich dual-core ARMCortex- A9 MPCore based processing

system (PS) and Xilinx programmable logic (PL) in a single device, built on

a state-of- the-art, high-performance, low- power process technology .It is a

joint venture by Xilinx (Zynq AP SoC), Digilent (board manufacturer) and

Avnet (distributor) .

• Xilinx Zynq XC7Z020-1CSG484 EPP (extensible processor platform)

containing Dual-core ARM Cortex-A9 (PS) running at 666 MHz and

Artix-7 FPGA (PL)

• Memory: 512 MB DDR3 and 256 Mb QSPI Flash

• On-board oscillators: 33.333 MHz (PS), 100 MHz (PL)

• Interfaces like USB-JTAG Programming, Ethernet, USB OTG, USB-

UART,SD Card, Pmod, LEDs, Dip switches, Push buttons etc.

• Display: HDMI output, VGA (12-bit colour)

• Power: 12 V @ 5A AC-DC regulator

3.2 Linaro

For the software-only implementation, Linaro Ubuntu distribution was used.

Linaro is a open-source complete Linux distribution based on Ubuntu. It


supports graphical desktop through on-board HDMI port. For booting from

SD card, Linaro files system has to be placed in different partition other

than the partition where Kernel image,devicetree reside. It is a persistent

OS, i.e. all changes are written to memory and it saves files after reboot or

shut down. OpenCV(version 3.1) was built on top of it. For booting Linux,

8 GB, class-10 SD card was used. As here Linaro file system was used, so

SD card must be partitioned properly into two section:

• First partition should have FAT file system and it will contain boot

image, kernel image and device tree file .

• Second partition should have ext4 file system and it will contain Linaro

distribution.

ZedBoard boots over a number of stages .

• Stage-0: After a power-on reset (POR), the hardware samples the boot

strap pins to determine the Boot mode: JTAG mode, NAND/NOR

flash mode, SD card mode etc. and optionally enables PS PLLs. Then

hard-coded BootROM is executed in primary CPU-0. BootROM con-

figures the PS to access the boot device. It validates and reads boot-

header to determine the boot flow and then load the FSBL in OCM .

• Stage-1: In stage-1 FSBL loaded and executed from OCM. It is re-

sponsible for several initialization functions including CPU initializa-

tion with PS7 Init configuration data, programming the PL using Bit-

stream (if available), loading SSBL into DDR memory, handoff control

to SSBL execution.

• Stage-2: For Linux booting, It is SSBL, such as U-Boot, open course

universal bootloader for Zynq. It is responsible for loading the Linux

kernel image, device-tree file and Linux file system. It also initializes

hardware that is not done by kernel like serial port, DDR memory.

3.3 Energy and power measurement on Zed-

Board

A 10 milli ohm, 1 Watt current sense resistor is in series with the 12V input

power supply. To measure the voltage across the resistor, one jumper (J21)

is connected with it. Agilent 34410A high-performance sampling multimeter

was used to measure the voltage. This multimeter was directly interfaced to

PC via USB port. From the PC itself it is possible to start/stop the voltage

measurement, see the waveform and also to export the results in either CSV

file format or in excel sheet for further calculations.ZedBoard was powered

by 12V adapter. The current value drawn from the power source can be

calculated by dividing the voltage value by that of the resistance using Ohms

law.

Figure 3.1: Power measurement setup

For measuring software power taken by the application code we run a dummy

Cases Power(W)base 3.528

Dummy on 1 core .096dummy on both core 0.192only Face Detection 0.132

FD + dummy 0.156FD + TD 0.372

Table 3.1: Power Measurement

code and pinned it to core 0 of the dual core Arm cortex on the board which

keep it busy all the time . Our application was pinned to another core , the

incremental jump observed in the voltage across the J21 pin was measured

giving peak value and the average value. The base power with linux b running

a dummy loop on one core is 3.65W while running our application of face

detection gives 3.822W , The incremental power is then computed to 156mW.

3.4 Cross Compilation for ZedBoard

Arm toolchain provided for linux arm-linux-gnueabihf was installed and OpenCV

3.1 was source compiled using this tool chain for Linaro based systems.

OpenCV was compiled both as a shared library and static libraries , stripped

version of OpenCV was also compiled . The same tool chain is use to com-

pile the application for Zedboard. The document for cross compilation is

included in Appendix.

3.5 Results

From the Fig.3.2 we can conclude that Accuracy decreases as scale factor

increases. The results were collected from 65 outdoor images consisting of

no faces, one face or more than one faces. The detection rate was normalized

assuming 1.05 gives maximum detection.

Fig.3.3 plots the number of false positives w.r.t scale factor and as scale factor

increases false positives tend to decrease as number of windows processed

decrease.But there is unexplained anomaly at scale factor 1.25. Fig.3.4 shows

as scale factor increases computation time decreases. To achieve less than 1s

of computation time and low false positives we selected scale factor for 1.2,

as we can observe here the computation time for frames in which no faces

are detected forms the lower bound and for those faces were detected form

the upper bound. The CDF from 3.5 to 3.12 are provide useful information

on embedded platforms to help in resource allocation and scheduling and

when to expect the computation of most of the frames to be completed.Fig

3.13 shows the energy consumption for the whole board with Face Detection

running

1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Face Detection Accuracy

True Faces Detected

Scale Factor

Nor

mal

ized

Acc

urac

y w

.r.t.

SF

-1.0

5

Figure 3.2: Face Detection Accuracy

1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.450

5

10

15

20

25

30

Face Detection Accuracy

False Positives

Scale Factors

# F

als

e P

osi

tive

s

Figure 3.3: Face Detection-False Positives

1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.40

250

500

750

1000

1250

1500

1750

2000

2250

2500

2750

3000

Face Detection

Average time taken(ms)

Both With Face Without Face

Scale Factor

Tim

e (

ms)

Figure 3.4: Face Detection-Average time taken

1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Face Detection computation time

Scale Factor - 1.05

With Faces Without Faces

Time(ms)

% o

f fra

mes

pro

cess

ed

90% 95% 99%

Figure 3.5: Face Detection-Computation Time(SF-1.05)

800 1000 1200 1400 1600 1800 2000 2200 2400 26000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Face Detection Computation Time

Scale Factor - 1.1


Time (ms)

% o

f fra

me

s p

roce

sse

d

90% 95%

99%


600 800 1000 1200 1400 1600 1800 2000 2200 2400 26000

0.2

0.4

0.6

0.8

1

1.2


Scale Factor - 1.15


Time (ms)

% fr

am

es

pro

cess

ed

90% 95%

99%


200 400 600 800 1000 1200 1400 1600 18000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Scale Factor - 1.2

With Faces Mean (With Faces) Without Faces

Time (ms)

% fr

ame

s p

roce

sse

d

90% 95% 99%


200 300 400 500 600 700 800 900 1000 1100 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Scale Factor - 1.25


Time (ms)

% fra

me

s p

roce

sse

d

90% 95% 99%


200 300 400 500 600 700 800 9000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Face Detection Computation time

Scale Factor - 1.3


Time (ms)

% fr

am

e p

roce

sse

d

90% 95% 99%


200 300 400 500 600 700 8000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Scale Factor - 1.35


Time (ms)

% fra

me

s p

roce

sse

d

90% 95%

99%


100 200 300 400 500 600 7000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Scale Factor - 1.4


Time (ms)

% fra

me

pro

ce

sse

d

90% 95% 99%


1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.40

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

8.5

9

9.5

10

10.5

11

Face Detection Energy Consumption

vs Scale Factor

Energy

Scale Factor

En

erg

y C

on

su

me

d (

J)

Figure 3.13: Face Detection-Board EnergyFD Energy Consumption

Figure 3.14: Face Detection-Component Energy

5 10 15 20 25 30 35 40 45 500

20

40

60

80

100

120

140

160

Detection of Low Resolution faces

Effect of scaling on distance of face detected

Size of faces detected(1x) Size of faces detected(2x) Size of faces detected(4x)

Distance (ft)

Siz

e of

face

s (p

ixel

s)

Figure 3.15: Face Detection-distance

1 2 40

5

10

15

20

25

False Positives vs Scaling

Effect of scaling

Column G

Scaling

# F

als

e P

osi

tive

s

Figure 3.16: Face Detection-upscaling

Figure 3.17: Face size variation with distance

Chapter 4

Hardware Software Codesign

4.1 Tools

4.1.1 SDSOC

The SDSoC (software defined system-on-chip) development environment pro-

vides an embedded C/C++ application development experience for Zynq All

Programmable SoC. It has complete industry first C/C++ full system op-

timizing compiler, System level Profiler, automated software acceleration in

programmable logic and automated system connectivity generation. It sup-

ports bare metal, Linux and FreeRTOS as target OS.

The SDSoC system compilers analyze a program to determine the data flow

between software and hardware functions, and generate an application spe-

cific system on chip. We can achieve high performance by configuring each

hardware function runs as an independent thread. The system compilers

ensures synchronization between hardware and software threads and enables

pipelined computation. The SDSoC system compilers can invoke the Vivado

HLS tool to compile the synthesizable C/C++ functions into programmable

logic. The tool generate a complete hardware system which includes DMAs,

interconnects, hardware buffer and other IPs, and it also configures the

FPGA by invoking the Vivado tools. SDSoC (Software Defined System-

on-Chip) is a C/C++ environment for complete embedded system design on

heterogeneous platform of Zynq using hardware/software partitioning. SD-

SoC performs program analysis, task scheduling and binding onto processor

and configurable logic for accelerator, specified by user. SDSoC compiler and

linker generates code for hardware and software that automatically orches-

trates communication and cooperation among hardware and software compo-

nents . User can play around with different accelerators by just toggling the

target of that function, i.e. hardware or software. It invokes Vivado HLS for


accelerator and interface synthesis, then makes connections in Vivado Design

Suite and generates the bit file. Thereafter it generates object code for pro-

cessor using GNU tool chain. It also generates the boot files: kernel image,

device tree, boot-image etc. to run the application from ramdisk. User can

optimize the hardware by using pragmas and he/she can also decide a data

mover type and PS-port which will be interfaced to accelerator. The some

important points are:

• User will profile the code and will define the hotspot function as HW

target. More than one accelerator is also possible here.

• The hardware is synthesized using Vivado HLS and so HLS guidelines

should be followed. HLS pragmas or compiler directives can be used to

optimize the hardware.

• The code is compiled by sdscc for C-code, sds++ for C++ code.

• The SDSoC linker creates SD card image to implement the application

in Linux environment.

• SDSoC also gives provision to decide specific data-movers: AXI DMA

in simple mode, AXI DMA scatter-gather mode etc and PS-PL inter-

face ports: ACP, HP, GP etc by using pragmas.

• In Linux, memory allocation is done in virtual space always. It is dis-

tributed across multiple pages in physical space. DMA or any hardware

operates on physical address only. So for each memory allocation, the

elements must be mapped to physical space. Scatter-gather DMA can

handle such list of pages, whereas simple DMA can handle only single

page.

• SDSoC provides mechanism to allocate contiguous memory in physical

address space using sds alloc and sds free. Basically Linux kernel also

has support for CMA (contiguous memory allocator).

• So in Linux, memory allocated by malloc can be taken care of by

scatter-gather DMA, where as simple DMA handles memory allocated

by sds alloc.

The advantage of SDSoC tool is its automatic generation of interconnects sys-

tem in accordance to the user specified pragmas and also its device drivers for

a linux based system without worrying about to write one as in case of linaro

based systems. SDSOC also provides support for OpenCV 3.0 and many of

the libraries are provided which can be used for software only version of the

application using OpenCV. Support for ml and face module of OpenCV is

not complete due to which face recognition program cannot be built in it.

However, we can build these libraries separately using the SDSOC tool chain

arm-xilinx-linux-gnueabi-g++ . This can be done in two ways, one is using

SDSoC SDK itself as explained in the user guide provided by the tool and

other is using the tool chain directly and cross compiling manually similar to

cross compiling for linaro based solutions. We tried using the second method

and were able to create required static libraries but while executing the pro-

grams using those libraries nothing came as a output or an error.

SDSoC comes with two files for each platform ,one describing software con-

figuration and other hardware. Whenever a new platform is generated , these

2 file are also generated and contains the path to the required libraries in

case of software. Whenever we have to add new libraries to the platform we

need to modify this file to reflect the path to the library. These platform files

have an extension of .pfm.

SDSoC also contains Vivado HLS libraries for its hardware synthesis part and

thus have access to OpenCV HLS libraries and functions provided by Xilinx

. These functions can be used in functions that will be marked for hardware

acceleration and are completely synthesizable. Though transforming and

transferring data structures in software OpenCV to synthesizable OpenCV

functions and vice versa requires interface functions which are provided by

Xilinx. But the problem with this is that the software OpenCV version in

SDSoC has version 3.0 which has deprecated many of the data structures

used by 2.3 version of HLS OpenCV. We developed our own interface APIs

for conversion from software version to hardware version. These API were

quite raw and take much time during conversion hence defeating the purpose.

So it was decided to flatten the OpenCV Face Detection code and remove all

the dependencies from the libraries of OpenCV. All templates were removed

,complex classes were replaced by arrays and corresponding structures. The

flattened code was then converted into more synthesizable form which could

be partitioned in to hardware for acceleration.

While building the projects using SDSoC SDK all the building steps are

logged into the file. We can run these commands manually also from the

terminal. We can use this hack to know the time taken in data transfer (

sending and receiving) using the DMA generated by the tool and opt for

the most optimized one by inserting our own statements in the intermediate

programs which include statements to invoke DMA functions such as cf send

and cf receive.

4.1.2 VIVADO HLS

Vivado HLS also comes with Vivado IDE as a component tool. Vivado HLS

takes the behavioural description written in C/C++/SystemC along with

some constraints, synthesizes it and produces an RTL description of the

same. Basically it is converting a behavioural descriptions into a timed cycle

accurate RTL descriptions. The input output entities involved in HLS pro-

cess are shown in figure 6.1. With the help Vivado HLS, the we can specify

the data type (integer, fixed-point or floating-point), abstraction of algorith-

mic description and interfaces (FIFO,AXI4,AXI4-Lite,AXI4-Stream). It is a

directives driven architecture-aware synthesis which gives the best quality of

Results. The accelerated designs can be verified using C/C++ test bench

simulation, automatic Verilog or VHDL simulation and test bench genera-

tion. The Vivado HLS also exports synthesized RTL designs as IP cores

by adding a desired bus interface which can be added to a system using IP

integrator.

4.2 Profiling & Code Modification

Profiling is a key technology to ensure optimal match between target hard-

ware and software by 0 the software efficiency. As the main focus is on

accelerator synthesis, so the profiling granularity, used here is coarse. Oth-

erwise, for fine granularity the bottleneck would be the communication be-

tween processor and hardware. From profiling, two critical information are

presented:occurrence of function and time spent in each function. After pro-

filing we will get the candidates for hardware acceleration. The profiler used

here is Gprof and Perf tool. Both codes ,one using OpenCV libraries and

other flattened code were profiled.

Face detection

xml load

image load

others

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

96.41

1.81

1.1

0.68

Profile Results

Face Detection

profiling

% of time

Fun

ctio

ns


predictOrdered

setWindow

integral

detectSingleScale

setImage

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

92.96

1.93

0.69

0.55

0.28

Profiling - Face Detection

profiling

% time

Fun

ctio

ns


The analysis of code shows that PredictOrdered function in which maximum

time is spent is nested into 3 loops , one loop to increase the window size

and rest for sliding the window over whole image. inside this kernel the

computation of threshold values is taking place for 1047 features divided

into 20 stages. The function can exit from any of the 20 stages ,but each

feature consist of a tree having 3 leaves and 2 internal nodes .

4.3 Hardware Acceleration

We will be using SDSOC for hardware acceleration . Partitioning will be

done in accordance to the profiling results. The kernel consist of 4 nested

loops . The outer 2 loops are fixed and inner most loop will run for 3 times

hence can be unrolled. The third loop is not fixed and can exit at any stage.

It can be implemented as a IP which is called each time for each stage.

The outer most 2 loops can be parallized to work on different part of image

concurrently. This can steps for acceleration need to carried out in future.

Chapter 5

Conclusion & Future Work

5.1 Conclusion

The design space exploration was performed for embedded implementation

of Face Detection and Recognition on Zed board . Relationship with scale

factors , computation time , accuracy and energy consumption is explored.

As scale factor is increased accuracy,computation time, energy consumption

decreases. The CDF were extracted for different scale factor to assist the

future controller to make decision for scheduling and time allocation of this

application. The Face detection can be clubbed with Upper Body detection

to reduce the computation time. The scale factor of 1.2 was chasen for Face

Detection and Database Size of 8 images / face for Face Recognition. Also

predictOrdered is the function which can be accelerated using hardware.

5.2 Future Work

The effect of variation in scale factor of both Face detection and upper body

detection needs to done. Hardware acceleration using SDSOC will be a major

work that need to be carried out . Creation of face recognition database for

different image dimensions.


Bibliography

[1] OpenCV 3.1.0. Face detection using haar cascades, 2015. OpenCV 3.1.0.

[2] Yi-Qing Wang. An analysis of the viola-jones face detection algorithm,

2014. Image Processing Online.


Modelling and Implementation of Face Detection and ... · Modelling and Implementation of Face...

Documents

Transcript of Modelling and Implementation of Face Detection and ... · Modelling and Implementation of Face...