Automatic Attendance System using CNN

AUTOMATIC ATTENDANCE SYSTEM

By: Pinaki Ranjan SarkarUnder the guidance of:

Dr. Gorthi R.K.S.S. Manyam &Dr. Deepak Mishra

OUTLINE

▪ Motivation▪ Motivation

▪ Objective

▪ System Requirements

▪ Design Details

▪ Tried methods

▪ Inspiration▪ Inspiration

▪ Main design

▪ Status so far

▪ Future work

MOTIVATION

▪ Taking attendance in large classes is:▪ Taking attendance in large classes is:

▪ Cumbersome

▪ Repetitive

▪ Consumes valuable class time

▪ What if we make an efficient face detection and recognition system for ▪ What if we make an efficient face detection and recognition system for this task?

OBJECTIVES

▪ Automatic user identification via face detection and recognition. ▪ Automatic user identification via face detection and recognition.

▪ Develop and implement an efficient face detection and recognition system.

▪ End-to-end face recognition system using deep learning.

DIFFICULTIES

▪ Large pose variation▪ Large pose variation

▪ Hidden faces & tiny faces

▪ Different illumination conditions, occlusions

SYSTEM REQUIREMENTS

▪ Hardware:▪ Hardware:

▪ A camera

▪ PC or Raspberry pi

▪ Software:

▪ Matlab 2013+

▪ Python 2.7▪ Python 2.7

▪ Lasagne API

DESIGN DETAILSDatabase

Face Detection

Face Recognition

Abhi - 1Priya – 1

Ayushi – 0Pinaki – 0Akshay – 1

Sidd - 1Sidd - 1

All are using CNN!!

GOING DEEP INTO FACE RECOGNITION

▪ Various methods are employed to recognize a person in wild.▪ Various methods are employed to recognize a person in wild.

▪ Comparing to traditional handcrafted features such as high dimensional LBP, Active Appearance Model(AAM), Active Shape Model(ASM) or Bayesian face, Gaussian face etc.; automatically learnt deep features based on personal identity are more advantageous.

▪ In most deep learning based face recognition methods the inputs to the deep model are aligned face images.deep model are aligned face images.

TRIED METHODS

▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalizationin Unconstrained Images”, CVPR-2015

TRIED METHODS

▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d ▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d solution." CVPR-2016

TRIED METHODS

▪ I have tried to implement some more papers but they failed when we are dealing with large pose.

▪ I have tried to implement some more papers but they failed when we are dealing with large pose.

▪ Instead of AAM, 3D fitted model (3D frontalisation doesn’t show significant improvements over simple 2D alignment*), we used Deep learning techniques to recognize a face using only personal identity clues.

* Banerjee, Sandipan, et al. "To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-Processing to Improve Face Recognition Performance?." arXiv preprint arXiv:1610.04823 (2016).

INSPIRATION

▪ Our work is inspired by some of the state-of-the-art papers.▪ Our work is inspired by some of the state-of-the-art papers.

▪ DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR-2014

▪ FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR-2015

▪ DeepID3: Face Recognition with Very Deep Neural Networks, CVPR-2015

▪ Supervised Transformer Network for Efficient Face Detection, ECCV-2016

▪ Towards End-to-End Face Recognition through Alignment Learning, arXiv-2017

▪ Spatial transformer networks. NIPS-2015

▪ Finding Tiny Faces. arXiv-2016

MAIN DESIGN

▪ The complete architecture has two stages▪ The complete architecture has two stages

▪ Face Detection

▪ Face Recognition

ARCHITECTURE FOR DETECTION

▪ They have provided an in-depth analysis of image resolution, object scale, and spatial context for the purposes of finding small faces.

▪ Still the detailed study of the paper is pending as I have found this paper very recently. I will briefly describe their architecture in the next slide

ARCHITECTURE FOR DETECTION

WHERE IT FAILS?

▪ For out of plane rotation this proposed method works fine but when 2D ▪ For out of plane rotation this proposed method works fine but when 2D rotation comes into picture then their method suffers from less accuracy.

▪ Some of the failures are shown in the next slide

11/14 True detection1 False detection

1/14 True detection1 False detection

ARCHITECTURE FOR RECOGNITION

Localization Network

Transformparameters

Recognition Recognition Network

Features

Augmented image128 X 128

Transformer

Aligned face64 X 64

Spatial Transformer Network

SPATIAL TRANSFORMER NETWORK

▪ Intuition behind STN▪ Intuition behind STN


▪ Intuition behind STN▪ Intuition behind STN

Sampling


▪ According to the original DeepMind paper, the spatial transformer can ▪ According to the original DeepMind paper, the spatial transformer can be used to implement any parametrizable transformation including translation, scaling, affine, projective.

▪ Suppose that for the ith target point pti = (xt

i ; yti ; 1) in the output image,

a grid generator generates its source coordinates (xsi ; y

si ; 1) in the input

image according to transformation parameters.

Projective transformation equation


▪ Sampler: (Mathematical Formulation)▪ Sampler: (Mathematical Formulation)


▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:



▪ So overall transformer model will be:




This is equivalent to convolving a sampling kernel k with the source image of H X W dimension




This is equivalent to convolving a sampling kernel k with the source image of H X W dimension

▪ All the blocks should be differentiable.


During the backward propagation, we need to calculate the gradient of Vi with respect to each of the eight transformation parameters.


Where,


▪ The similarity transformation is defined here▪ The similarity transformation is defined here

in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and vertical translation displacements respectively. Analogously, the gradients of Vi respected to α and λ are shown below:

STATUS SO FAR

▪ STN is implemented and tested on Labeled Face in Wild (LFW) dataset.▪ STN is implemented and tested on Labeled Face in Wild (LFW) dataset.

▪ Out of 5423 classes, we took only 1000 classes because of the limitation in computation.

▪ During training we did data augmentation with random 2D-Affine transformationon face data to increase the training size.

▪ We had 15399 training images, 3501 testing images and 2100 validation imagesduring training.

▪ We introduced a CNN architecture to extract deep features from the transformed face.

STATUS SO FAR

▪ Output of STN network▪ Output of STN network

STATUS SO FARConv

Conv

Pool & Actv

Conv

Conv

Conv

Pool & Actv

Pool & Actv

Pool & Actv

Dense

Dense

Dense

Actv

STN Architecture

Conv

Pool & Actv

Pool & Actv

Dense

Dense

Dense

Actv

Actv

Recognition Architecture

FUTURE WORK

▪ Try to validate the architecture in real data (taken from classroom)▪ Try to validate the architecture in real data (taken from classroom)

▪ Without training a new CNN model, compare recognition accuracy with the ImageNet winning pre-trained models.

▪ Adding 2D rotation invariance face detection with the recent model.

THANK YOU!

Automatic Attendance System using CNN

Engineering

Transcript of Automatic Attendance System using CNN