Unconstrained Face Recognition: Deep Learning...

Post on 19-Jul-2020

6 views 0 download

Transcript of Unconstrained Face Recognition: Deep Learning...

Unconstrained Face Recognition: Deep Learning Approaches

Chun-Ting Huang

2016/7/22USC Multimedia Communication Lab 2

2016/7/22USC Multimedia Communication Lab 3

http://www.nytimes.com/2015/08/13/us/facial-recognition-software-moves-from-overseas-wars-to-local-police.html?_r=0

Why Face?

▪ Facial features scored highest compatibility in a Machine Readable Travel Documents (MRTD) system

2016/7/22USC Multimedia Communication Lab 4

Hietmeyer, R.: Biometric identification promises fast and secure processing of airline passengers. ICAO J. 55(9), 10–11 (2000)

Outline

▪ Introduction

▪ Unconstrained face dataset

▪ Unconstrained face recognition with deep learning

▪ Papers from industry

▪ Papers from academia

▪ Discussion and conclusion

2016/7/22USC Multimedia Communication Lab 5

Introduction

Categorization

▪ A face recognition system operates in two modes

▪ Face verification (authentication)

▪ Face identification (recognition)

▪ Face verification

▪ One-to-one match

▪ Between query face image against an enrollment face image

▪ Face identification

▪ One-to-many match

▪ Between query face against multiple faces in the enrollment database

2016/7/22USC Multimedia Communication Lab 7

Face Recognition Processing Flow

2016/7/22USC Multimedia Communication Lab 8

Jain, Anil K., and Stan Z. Li. Handbook of face recognition. Vol. 1. New York: springer, 2011

Face Subspace

2016/7/22USC Multimedia Communication Lab 9

Jain, Anil K., and Stan Z. Li. Handbook of face recognition. Vol. 1. New York: springer, 2011

Frontal Face Recognition

2016/7/22USC Multimedia Communication Lab 10

Conventional Approaches

2016/7/22USC Multimedia Communication Lab 11

▪ Template matching

▪ PCA: M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neurosicence, Vol. 3, No. 1, Win. 1991

▪ LDA: Kamran Etemad and Rama Chellappa, ” Discriminant analysis for recognition of human face images”, JOSA A, 1997

▪ HOG: Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005

▪ LBP: Ahonen, Timo and Hadid, Abdenour and Pietikainen, Matti, “Face description with local binary patterns: Application to face recognition”, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2006

Frontal is NOT Enough

2016/7/22USC Multimedia Communication Lab 12

Facial Landmark Localization

▪ Model based approach

▪ ASM: T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham (1995). "Active shape models - their training and application". Computer Vision and Image Understanding (61): 38–59

▪ AAM: T.F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. ECCV, 2:484–498, 1998

▪ Regression based approach

▪ Cascade pose regression: P. Doll’ar, P. Welinder, and P. Perona. “Cascaded pose regression”. In CVPR. IEEE, 2010

▪ Explicit shape regression: X. Cao, Y.Wei, F.Wen, and J. Sun. “Face alignment by explicit shape regression”. In CVPR. IEEE, 2012

2016/7/22USC Multimedia Communication Lab 13

2016/7/22USC Multimedia Communication Lab 14

Explicit Shape Regression

2016/7/22USC Multimedia Communication Lab 15

t = 0 t = 1 t = 2 … t = 10

𝐼: image

initialized

from

face

detector

affine

transformtransform

back

Unconstrained Face Dataset

Labeled Faces in the Wild

▪ Contains 13233 images

▪ Consists of 5749 people

▪ 1680 people with two or more images

▪ Proposed in ICCV 2007

▪ Photos are collected through internet

▪ Also provide aligned faces with three types of alignment methods

USC Multimedia Communication Lab 17

Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained

Environments. University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.

2016/7/22

LFW: Performance (Image-Restricted)

2016/7/22USC Multimedia Communication Lab 18

LFW: Performance (Image-Unrestricted)

2016/7/22USC Multimedia Communication Lab 19

2016/7/22USC Multimedia Communication Lab 20

Youtube Face Database

▪ Lior Wolf, Tal Hassner and Itay Maoz, Face Recognition in Unconstrained Videos with Matched Background Similarity. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011

▪ 3425 videos of 1595 people

2016/7/22USC Multimedia Communication Lab 21

YTF: Performance (Image-Restricted)

2016/7/22USC Multimedia Communication Lab 22

YTF: Performance (Image-Restricted)

▪ EER - the error rate at the ROC operating point where the false positive and false negative rates are equal

2016/7/22USC Multimedia Communication Lab 23

Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep face recognition." Proceedings of the British Machine Vision 1.3 (2015): 6.

IARPA Janus benchmark A

▪ Klare et al. Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A, CVPR, June 2015

▪ All labeled with manual bounding box annotation with fiducial landmarks

▪ Amazon Mechanical Turk (AMT)

▪ LFW are not fully constrained:

▪ Commodity face detector was used to detect all faces

▪ Restricted to pose variation, occlusions, and illuminations conditions

▪ Three landmarks: two eyes, and base of nose

▪ Geographic distribution

7/22/2016USC Multimedia Communications Lab 24

IJB-A Labeled Information

▪ 10-fold gallery / probe image set

▪ 17,000 images for training (333 subjects)

▪ Gallery set: 3000 images (167 subjects)

▪ Probe set: 13,700 images (include non-gallery subjects)

▪ X Y coordinates of eyes and nose base

▪ Face yaw angle (if applicable)

▪ Observation labeling: FOREHEAD_VISIBLE, EYES_VISIBLE, NOSE_MOUTH_VISIBLE, INDOOR, GENDER, SKIN_TONE (6 levels), AGE (5 levels), FACIAL_HAIR

7/22/2016USC Multimedia Communications Lab 25

Pose Variant

7/22/2016USC Multimedia Communications Lab 26

7/22/2016USC Multimedia Communications Lab 27

7/22/2016USC Multimedia Communications Lab 28

7/22/2016USC Multimedia Communications Lab 29

7/22/2016USC Multimedia Communications Lab 30

IJB-A Released Benchmark (1/29/2016)

7/22/2016USC Multimedia Communications Lab 31

Unconstrained Face RecognitionWith Deep Learning

Facebook: DeepFace

▪ DeepFace: Closing the Gap to Human-Level Performance in Face Verification

▪ Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1701-1708

▪ Claimed contributions

▪ Facial alignment with 3D modeling

▪ Advance LFW benchmark performance

▪ Reaching near human-performance

▪ Advance YTF benchmark performance

USC Multimedia Communication Lab 33Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf; The IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), 2014, pp. 1701-1708 2016/7/22

3D Facial Alignment

▪ Detected face provided with 6 initial fiducial points

▪ 2D-aligned crop

▪ 67 fiducial points from Delaunay triangulation

▪ 3D shape transform

▪ Triangle visibility w.r.t. to the fitted 3D-2D camera

▪ Affine warping

▪ Final frontalized crop

2016/7/22USC Multimedia Communication Lab 34

DeepFace Architecture

2016/7/22USC Multimedia Communication Lab 35

DeepFace: Performance

▪ Results on Labeled Face in the Wild (LFW) and YouTube Faces (YTF) databases

USC Multimedia Communication Lab 36Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf; The IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), 2014, pp. 1701-1708 2016/7/22

DeepID

▪ Sun, Yi, Xiaogang Wang, and Xiaoou Tang. "Deep learning face representation from predicting 10,000 classes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014

2016/7/22USC Multimedia Communication Lab 37

60 patches

DeepID

2016/7/22USC Multimedia Communication Lab 38

DeepID Performance (1)

2016/7/22USC Multimedia Communication Lab 39

160-dimensional feature

DeepID Performance (2)

2016/7/22USC Multimedia Communication Lab 40

o: outside dataset

u: unrestricted protocol

r: restricted protocol

Google: FaceNet

▪ Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015

2016/7/22USC Multimedia Communication Lab 41

FaceNet

▪ Objective - learning a Euclidean embedding per image with DNN

▪ Map the face images to a compact Euclidean space

▪ Distance in space = Face Similarity

▪ Approach – DNN with triplet loss

2016/7/22USC Multimedia Communication Lab 42

Triplet Loss

▪ Embedding: 𝑓(𝑥) ∈ ℝ𝑑

▪ Input image as 𝑥𝑖𝑎 (anchor), 𝑥𝑖

𝑝(positive), and 𝑥𝑖

𝑛 (negative)

▪ 𝛼 is a margin between positive and negative pairs

▪ Corresponding loss function

2016/7/22USC Multimedia Communication Lab 43

Triplet Selection

▪ To achieve fast convergence for previous loss function

▪ Select 𝑥𝑖𝑝

for (hard positive)

▪ Select 𝑥𝑖𝑛 for (hard negative)

▪ Sampled the training set with

▪ 40 faces per identity in each mini-batch as positive examplars

▪ Randomly sampled negative faces are added

▪ To avoid converging to bad local minima

▪ (semi-hard)

2016/7/22USC Multimedia Communication Lab 44

Deep Convolutional Networks

▪ CNN is trained using Stochastic Gradient Descent (SGD) with standard backpropagation

▪ Two types of architectures

▪ Zeiler&Fergus architecture

▪ GoogLeNet style Inception model

▪ Trained on a CPU cluster for 1000 to 2000 hours

▪ 100M-200M training face thumbnails consisting 8M identities

▪ Input sizes range from 96x96 to 224x224 pixels

2016/7/22USC Multimedia Communication Lab 45

Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer vision–ECCV 2014. Springer International Publishing, 2014.

818-833.

Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

Network Details

2016/7/22USC Multimedia Communication Lab 46

Performance

▪ Validation rate VAL (true accepts / same identity pairs) on 1M hold-out test set

▪ Output dimension (embedding dimension)’s VAL

2016/7/22USC Multimedia Communication Lab 47

Sensitivity to Image Quality

2016/7/22USC Multimedia Communication Lab 48

Deep Face Recognition

▪ Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep face recognition." Proceedings of the British Machine Vision 1.3 (2015): 6.

▪ Achieved similar performance on LFW and YTF dataset

▪ With less training images and identities

▪ 2.6M images collected from Google images and Bing with keyword “actor”

▪ Same triplet loss strategy with FaceNet

2016/7/22USC Multimedia Communication Lab 49

Fine-tuned with VGG Model

▪ The “Very Deep” Architecture

▪ Different from previous architectures proposed

▪ Network Details:

▪ 3 x 3 Convolution Kernels (Very small)

▪ Conv. Stride 1 px.

▪ Relu non-linearity

▪ No local contrast normalisation

▪ 3 Fully connected layers

2016/7/22USC Multimedia Communication Lab 50

image

Conv-64

maxpool

fc-4096

fc-4096

Softmax

Conv-64

Conv-128

maxpool

Conv-128

Conv-256

maxpool

Conv-256

Conv-512

maxpool

Conv-512

Conv-512

Conv-512

maxpool

Conv-512

Conv-512

fc-2622

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint

arXiv:1409.1556 (2014).

Training• MatConvNet Tootlbox

• Nvidia CuDNN bindings

• Multi GPU Training (approx 3.5x speedup)

• Nvidia Titan Black

• 7 days of training

• Stochastic Gradient Descent with back prop.

• Accumulator Descent for large batch sizes

• Batch Size: 256

• Incremental FC layer training

• 2622 way multi class criterion (soft max)

2016/7/22USC Multimedia Communication Lab 51

image

Conv-64

maxpool

fc-4096

fc-4096

Softmax

Conv-64

Conv-128

maxpool

Conv-128

Conv-256

maxpool

Conv-256

Conv-512

maxpool

Conv-512

Conv-512

Conv-512

maxpool

Conv-512

Conv-512

fc-2622

Vedaldi, Andrea, and Karel Lenc. "MatConvNet: Convolutional neural networks for matlab."Proceedings of the 23rd Annual ACM Conference

on Multimedia Conference. ACM, 2015.

Performance on LFW

2016/7/22USC Multimedia Communication Lab 52

No. Method # Training

Images

# Networks Accuracy

1 Fisher Vector Faces - - 93.10

2 DeepFace 4 M 3 97.35

3 DeepFace Fusion 500 M 5 98.37

4 DeepID-2,3 Full 200 99.47

5 FaceNet 200 M 1 98.87

6 FaceNet+

Alignment

200 M 1 99.63

7 VGG Face 2.6 M 1 98.95

Performance on YTF

2016/7/22USC Multimedia Communication Lab 53

No. Method # Training

Images

# Networks 100%-EER Accuracy

1 Video Fisher Vector

Faces

- - 87.7 93.10

2 DeepFace 4 M 1 91.4 91.4

4 DeepID-2,2+,3 200 - 93.2

5 FaceNet +

Alignment

200 M 1 - 95.1

7 VGG Face 2.6 M 1 97.4 97.3

Lightened CNN

▪ Wu, Xiang, Ran He, and Zhenan Sun. "A Lightened CNN for Deep Face Representation." arXiv preprint arXiv:1511.02683 (2015).

▪ Obtained competitive performance with previous models

▪ Composed by two networks

▪ New activation function: Max-Feature-Map (MFM) to replace ReLU

2016/7/22USC Multimedia Communication Lab 54

Max-Feature-Map

2016/7/22USC Multimedia Communication Lab 55

2016/7/22USC Multimedia Communication Lab 56

Performance

▪ On LFW:

▪ On YTF:

2016/7/22USC Multimedia Communication Lab 57

Deep Learning Applications Other than Recognition

Incorrect Alignment

2016/7/22USC Multimedia Communication Lab 59

Liu, Ziwei, et al. "Deep learning face attributes in the wild." Proceedings of the IEEE International Conference on Computer Vision. 2015.

Deep Learning Face Attributes

2016/7/22USC Multimedia Communication Lab 60

Details of the Networks

▪ Applied AlexNet directly for LNet

▪ Pre-trained with ImageNet 1000 object categories

▪ Fine-tuning LNet using attribute tags

2016/7/22USC Multimedia Communication Lab 61

Face Localization Performance (LNet)

2016/7/22USC Multimedia Communication Lab 62

Face localization performance (LNet)

2016/7/22USC Multimedia Communication Lab 63

Face Attributes Visualization

2016/7/22USC Multimedia Communication Lab 64

Attribute Accuracy

2016/7/22USC Multimedia Communication Lab 65

Discussion and Conclusion

LFW Survey

▪ Labeled Faces in the Wild: A Survey: Erik Learned-Miller, Gary Huang, AruniRoyChowdhury, Haoxiang Li, Gang Hua

▪ The future of face recognition

▪ Verification versus identification

▪ Not uncommon that two random individuals have large differences in appearance

▪ The more people in a gallery, the greater the chance that two individuals have similar appearance

▪ New face dataset

▪ IJB-A

▪ CASIA

▪ FaceScrub

▪ MegaFace

2016/7/22USC Multimedia Communication Lab 67

Discussion

▪ Unconstrained face recognition is a competitive field

▪ Target dataset: IJB-A

▪ Testing different approaches (with source code / trained models)

▪ Working on checking the effectiveness of lightened CNN

▪ Facial attributes may serve as auxiliary purpose

2016/7/22USC Multimedia Communication Lab 68

Large-scale CelebFaces Attributes (CelebA) Dataset

▪ S. Yang, P. Luo, C. C. Loy, and X. Tang, "From Facial Parts Responses to Face Detection: A Deep Learning Approach", in IEEE International Conference on Computer Vision (ICCV), 2015

▪ 10,177 number of identities

▪ 202,599 number of face images

▪ 5 landmark locations, 40 binary attributes annotations per image

▪ Available for download

▪ 1.34 GB for 202,599 align&cropped face images

▪ Similarity transformation according to two eye locations and are resized to 218*178

▪ 9.8 GB for 202,599 original web face images

2016/7/22USC Multimedia Communication Lab 69

Large-scale CelebFaces Attributes (CelebA) Dataset

2016/7/22USC Multimedia Communication Lab 70

Deep Face Dreams

2016/7/22USC Multimedia Communication Lab 71

Representative ImageNeuron Inversion

Mahendran, Aravindh, and Andrea Vedaldi. "Understanding deep image representations by inverting them." Computer Vision and Pattern Recognition (CVPR), 2015

Deep Face Dreams

2016/7/22USC Multimedia Communication Lab 72

Representative Image Neuron InversionMahendran, Aravindh, and Andrea Vedaldi. "Understanding deep image representations by inverting them." Computer Vision and Pattern Recognition (CVPR), 2015

Deep Face Dreams

2016/7/22USC Multimedia Communication Lab 73

Representative Image Neuron Inversion

Mahendran, Aravindh, and Andrea Vedaldi. "Understanding deep image representations by inverting them." Computer Vision and Pattern Recognition (CVPR), 2015

Deep Face Dreams

2016/7/22USC Multimedia Communication Lab 74

Representative Image Neuron Inversion

Mahendran, Aravindh, and Andrea Vedaldi. "Understanding deep image representations by inverting them." Computer Vision and Pattern Recognition (CVPR), 2015

Questions?