Download - Face Recognition with Deep Learninghji/cs519_slides/Face Recognition with Deep... · Face Recognition with Deep Learning. Outline 1. Introduction 2. Related works 3. DeepFace 4. ...

Mo Guo

For CS519

Face Recognition with

Deep Learning

Outline1. Introduction

2. Related works

3. DeepFace

4. Alignment

5. Learning

6. Training

7. Results

• Classical Face recognition pipeline

Introduction

• Shallow Learning

SIFT, LBP, HOG, etc.Features = handpicked Works well on small datasets but fails on large datasetsAlso ails on illumination variations and facial expressions

Face patterns lie on a complex nonlinear and non‐convex manifold in the high‐dimensional space.

Related Works

• Deep LearningConvolutional Neural Networks (CNNs)DeepFaceDeepID SeriesFacenetVGGFaceetc.

Related Works

DeepFace (Taigman and Wolf 2014)

2D/3D face modeling and alignment

using affine transformations

9 layer deep neural network

120 million parameters

Alignment（Frontalization）(a) The detected face, with 6 initial fiducial points.

(b) The induced 2D-aligned crop.

(c) 67 fiducial points on the 2D-aligned crop with their

corresponding Delaunay triangulation, we added

triangles on the contour to avoid discontinuities.

(d) The reference 3D shape transformed to the 2D-

aligned crop image-plane.

(e) Triangle visibility w.r.t. to the fitted 3D-2D camera;

darker triangles are less visible.

(f) The 67 fiducial points induced by the 3D model that

are used to direct the piece-wise affine warping.

(g) The final frontalized crop.

(h) A new view generated by the 3D model.

Deep Learning

• Input: 3D aligned 3 channel (RGB) face image

152x152 pixels

• 9 layer deep neural network architecture

• Performs softmax for minimizing cross entropy

loss

• Uses SGD(stochastic), Dropout, ReLU

• Outputs k-Class prediction

Architecture

Layer 1-3 :

• Convolution layers - extract low-level features (e.g. simple edges and

texture)

• ReLU after each conv. layer, making the whole cascade produce

highly non-linear and sparse features

• Max-pooling: make convolution network more robust to local

translations.

Architecture

Layer 4-6:

• Apply filters to different locations on the feature map

• Similar to a conv. layer but spatially dependent

Architecture

Layer F7

• Fully connected and generates 4096d vector

• Sparse representation of face descriptor

• 75% of outputs are zero

• mainly due to ReLU and Dropout

Architecture

• F8 calculates probability with softmax

• softmax produces a distribution over the class labels

• Cross-entropy loss function: for each training sample

• Computed using SGD and performs backpropagation

Layer F8

• Fully connected and generates 4030d vector

Training

• Trained on SFC 4M faces (4030 identities, 800-1200 images per person)

• Focus on Labeled Faces in the Wild (LFW) evaluation

• Used SGD with momentum of 0.9

• Learning rate 0.01 with manual decreasing, final rate was 0.0001

• Random weight initializing

• 15 epochs of training

• 3 days total on a GPU-based engine

Training on SFC

• Experimented with different depths of networks

• Removed C3/L4,L5/C3, L4, L5

• Compared error rate to number of classes K

– Deeper is better

Result on LFW