Unconstrained Face Recognition: Deep Learning...

Unconstrained Face Recognition: Deep Learning Approaches

Chun-Ting Huang

2016/7/22USC Multimedia Communication Lab 2

http://www.nytimes.com/2015/08/13/us/facial-recognition-software-moves-from-overseas-wars-to-local-police.html?_r=0

Why Face?

▪ Facial features scored highest compatibility in a Machine Readable Travel Documents (MRTD) system

Hietmeyer, R.: Biometric identification promises fast and secure processing of airline passengers. ICAO J. 55(9), 10–11 (2000)

Outline

▪ Introduction

▪ Unconstrained face dataset

▪ Unconstrained face recognition with deep learning

▪ Papers from industry

▪ Papers from academia

▪ Discussion and conclusion

Introduction

Categorization

▪ A face recognition system operates in two modes

▪ Face verification (authentication)

▪ Face identification (recognition)

▪ Face verification

▪ One-to-one match

▪ Between query face image against an enrollment face image

▪ Face identification

▪ One-to-many match

▪ Between query face against multiple faces in the enrollment database

Face Recognition Processing Flow

Jain, Anil K., and Stan Z. Li. Handbook of face recognition. Vol. 1. New York: springer, 2011

Face Subspace

Jain, Anil K., and Stan Z. Li. Handbook of face recognition. Vol. 1. New York: springer, 2011

Frontal Face Recognition

Conventional Approaches

▪ Template matching

▪ PCA: M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neurosicence, Vol. 3, No. 1, Win. 1991

▪ LDA: Kamran Etemad and Rama Chellappa, ” Discriminant analysis for recognition of human face images”, JOSA A, 1997

▪ HOG: Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005

▪ LBP: Ahonen, Timo and Hadid, Abdenour and Pietikainen, Matti, “Face description with local binary patterns: Application to face recognition”, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2006

Frontal is NOT Enough

Facial Landmark Localization

▪ Model based approach

▪ ASM: T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham (1995). "Active shape models - their training and application". Computer Vision and Image Understanding (61): 38–59

▪ AAM: T.F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. ECCV, 2:484–498, 1998

▪ Regression based approach

▪ Cascade pose regression: P. Doll’ar, P. Welinder, and P. Perona. “Cascaded pose regression”. In CVPR. IEEE, 2010

▪ Explicit shape regression: X. Cao, Y.Wei, F.Wen, and J. Sun. “Face alignment by explicit shape regression”. In CVPR. IEEE, 2012

Explicit Shape Regression

t = 0 t = 1 t = 2 … t = 10

𝐼: image

initialized

detector

affine

transformtransform

Unconstrained Face Dataset

Labeled Faces in the Wild

▪ Contains 13233 images

▪ Consists of 5749 people

▪ 1680 people with two or more images

▪ Proposed in ICCV 2007

▪ Photos are collected through internet

▪ Also provide aligned faces with three types of alignment methods

USC Multimedia Communication Lab 17

Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained

Environments. University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.

2016/7/22

LFW: Performance (Image-Restricted)

LFW: Performance (Image-Unrestricted)

Youtube Face Database

▪ Lior Wolf, Tal Hassner and Itay Maoz, Face Recognition in Unconstrained Videos with Matched Background Similarity. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011

▪ 3425 videos of 1595 people

YTF: Performance (Image-Restricted)

▪ EER - the error rate at the ROC operating point where the false positive and false negative rates are equal

Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep face recognition." Proceedings of the British Machine Vision 1.3 (2015): 6.

IARPA Janus benchmark A

▪ Klare et al. Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A, CVPR, June 2015

▪ All labeled with manual bounding box annotation with fiducial landmarks

▪ Amazon Mechanical Turk (AMT)

▪ LFW are not fully constrained:

▪ Commodity face detector was used to detect all faces

▪ Restricted to pose variation, occlusions, and illuminations conditions

▪ Three landmarks: two eyes, and base of nose

▪ Geographic distribution

7/22/2016USC Multimedia Communications Lab 24

IJB-A Labeled Information

▪ 10-fold gallery / probe image set

▪ 17,000 images for training (333 subjects)

▪ Gallery set: 3000 images (167 subjects)

▪ Probe set: 13,700 images (include non-gallery subjects)

▪ X Y coordinates of eyes and nose base

▪ Face yaw angle (if applicable)

▪ Observation labeling: FOREHEAD_VISIBLE, EYES_VISIBLE, NOSE_MOUTH_VISIBLE, INDOOR, GENDER, SKIN_TONE (6 levels), AGE (5 levels), FACIAL_HAIR

Pose Variant

IJB-A Released Benchmark (1/29/2016)

Unconstrained Face RecognitionWith Deep Learning

Facebook: DeepFace

▪ DeepFace: Closing the Gap to Human-Level Performance in Face Verification

▪ Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1701-1708

▪ Claimed contributions

▪ Facial alignment with 3D modeling

▪ Advance LFW benchmark performance

▪ Reaching near human-performance

▪ Advance YTF benchmark performance

USC Multimedia Communication Lab 33Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf; The IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), 2014, pp. 1701-1708 2016/7/22

3D Facial Alignment

▪ Detected face provided with 6 initial fiducial points

▪ 2D-aligned crop

▪ 67 fiducial points from Delaunay triangulation

▪ 3D shape transform

▪ Triangle visibility w.r.t. to the fitted 3D-2D camera

▪ Affine warping

▪ Final frontalized crop

DeepFace Architecture

DeepFace: Performance

▪ Results on Labeled Face in the Wild (LFW) and YouTube Faces (YTF) databases

USC Multimedia Communication Lab 36Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf; The IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), 2014, pp. 1701-1708 2016/7/22

DeepID

▪ Sun, Yi, Xiaogang Wang, and Xiaoou Tang. "Deep learning face representation from predicting 10,000 classes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014

60 patches

DeepID

DeepID Performance (1)

160-dimensional feature

DeepID Performance (2)

o: outside dataset

u: unrestricted protocol

r: restricted protocol

Google: FaceNet

▪ Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015

FaceNet

▪ Objective - learning a Euclidean embedding per image with DNN

▪ Map the face images to a compact Euclidean space

▪ Distance in space = Face Similarity

▪ Approach – DNN with triplet loss

Triplet Loss

▪ Embedding: 𝑓(𝑥) ∈ ℝ𝑑

▪ Input image as 𝑥𝑖𝑎 (anchor), 𝑥𝑖

𝑝(positive), and 𝑥𝑖

𝑛 (negative)

▪ 𝛼 is a margin between positive and negative pairs

▪ Corresponding loss function

Triplet Selection

▪ To achieve fast convergence for previous loss function

▪ Select 𝑥𝑖𝑝

for (hard positive)

▪ Select 𝑥𝑖𝑛 for (hard negative)

▪ Sampled the training set with

▪ 40 faces per identity in each mini-batch as positive examplars

▪ Randomly sampled negative faces are added

▪ To avoid converging to bad local minima

▪ (semi-hard)

Deep Convolutional Networks

▪ CNN is trained using Stochastic Gradient Descent (SGD) with standard backpropagation

▪ Two types of architectures

▪ Zeiler&Fergus architecture

▪ GoogLeNet style Inception model

▪ Trained on a CPU cluster for 1000 to 2000 hours

▪ 100M-200M training face thumbnails consisting 8M identities

▪ Input sizes range from 96x96 to 224x224 pixels

Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer vision–ECCV 2014. Springer International Publishing, 2014.

818-833.

Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

Network Details

Performance

▪ Validation rate VAL (true accepts / same identity pairs) on 1M hold-out test set

▪ Output dimension (embedding dimension)’s VAL

Sensitivity to Image Quality

Deep Face Recognition

▪ Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. "Deep face recognition." Proceedings of the British Machine Vision 1.3 (2015): 6.

▪ Achieved similar performance on LFW and YTF dataset

▪ With less training images and identities

▪ 2.6M images collected from Google images and Bing with keyword “actor”

▪ Same triplet loss strategy with FaceNet

Fine-tuned with VGG Model

▪ The “Very Deep” Architecture

▪ Different from previous architectures proposed

▪ Network Details:

▪ 3 x 3 Convolution Kernels (Very small)

▪ Conv. Stride 1 px.

▪ Relu non-linearity

▪ No local contrast normalisation

▪ 3 Fully connected layers

Conv-64

maxpool

fc-4096

Softmax

Conv-64

Conv-128

maxpool

Conv-128

Conv-256

maxpool

Conv-256

Conv-512

maxpool

Conv-512

maxpool

Conv-512

fc-2622

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint

arXiv:1409.1556 (2014).

Training• MatConvNet Tootlbox

• Nvidia CuDNN bindings

• Multi GPU Training (approx 3.5x speedup)

• Nvidia Titan Black

• 7 days of training

• Stochastic Gradient Descent with back prop.

• Accumulator Descent for large batch sizes

• Batch Size: 256

• Incremental FC layer training

• 2622 way multi class criterion (soft max)

Conv-64

maxpool

fc-4096

Softmax

Conv-64

Conv-128

maxpool

Conv-128

Conv-256

maxpool

Conv-256

Conv-512

maxpool

Conv-512

maxpool

Conv-512

fc-2622

Vedaldi, Andrea, and Karel Lenc. "MatConvNet: Convolutional neural networks for matlab."Proceedings of the 23rd Annual ACM Conference

on Multimedia Conference. ACM, 2015.

Performance on LFW

No. Method # Training

Images

# Networks Accuracy

1 Fisher Vector Faces - - 93.10

2 DeepFace 4 M 3 97.35

3 DeepFace Fusion 500 M 5 98.37

4 DeepID-2,3 Full 200 99.47

5 FaceNet 200 M 1 98.87

6 FaceNet+

Alignment

200 M 1 99.63

7 VGG Face 2.6 M 1 98.95

Performance on YTF

No. Method # Training

Images

# Networks 100%-EER Accuracy

1 Video Fisher Vector

- - 87.7 93.10

2 DeepFace 4 M 1 91.4 91.4

4 DeepID-2,2+,3 200 - 93.2

5 FaceNet +

Alignment

200 M 1 - 95.1

7 VGG Face 2.6 M 1 97.4 97.3

Lightened CNN

▪ Wu, Xiang, Ran He, and Zhenan Sun. "A Lightened CNN for Deep Face Representation." arXiv preprint arXiv:1511.02683 (2015).

▪ Obtained competitive performance with previous models

▪ Composed by two networks

▪ New activation function: Max-Feature-Map (MFM) to replace ReLU

Max-Feature-Map

Performance

▪ On LFW:

▪ On YTF:

Deep Learning Applications Other than Recognition

Incorrect Alignment

Liu, Ziwei, et al. "Deep learning face attributes in the wild." Proceedings of the IEEE International Conference on Computer Vision. 2015.

Deep Learning Face Attributes

Details of the Networks

▪ Applied AlexNet directly for LNet

▪ Pre-trained with ImageNet 1000 object categories

▪ Fine-tuning LNet using attribute tags

Face Localization Performance (LNet)

Face localization performance (LNet)

Face Attributes Visualization

Attribute Accuracy

Discussion and Conclusion

LFW Survey

▪ Labeled Faces in the Wild: A Survey: Erik Learned-Miller, Gary Huang, AruniRoyChowdhury, Haoxiang Li, Gang Hua

▪ The future of face recognition

▪ Verification versus identification

▪ Not uncommon that two random individuals have large differences in appearance

▪ The more people in a gallery, the greater the chance that two individuals have similar appearance

▪ New face dataset

▪ IJB-A

▪ CASIA

▪ FaceScrub

▪ MegaFace

Discussion

▪ Unconstrained face recognition is a competitive field

▪ Target dataset: IJB-A

▪ Testing different approaches (with source code / trained models)

▪ Working on checking the effectiveness of lightened CNN

▪ Facial attributes may serve as auxiliary purpose

Large-scale CelebFaces Attributes (CelebA) Dataset

▪ S. Yang, P. Luo, C. C. Loy, and X. Tang, "From Facial Parts Responses to Face Detection: A Deep Learning Approach", in IEEE International Conference on Computer Vision (ICCV), 2015

▪ 10,177 number of identities

▪ 202,599 number of face images

▪ 5 landmark locations, 40 binary attributes annotations per image

▪ Available for download

▪ 1.34 GB for 202,599 align&cropped face images

▪ Similarity transformation according to two eye locations and are resized to 218*178

▪ 9.8 GB for 202,599 original web face images

Large-scale CelebFaces Attributes (CelebA) Dataset

Deep Face Dreams

Representative ImageNeuron Inversion

Mahendran, Aravindh, and Andrea Vedaldi. "Understanding deep image representations by inverting them." Computer Vision and Pattern Recognition (CVPR), 2015

Deep Face Dreams

Representative Image Neuron InversionMahendran, Aravindh, and Andrea Vedaldi. "Understanding deep image representations by inverting them." Computer Vision and Pattern Recognition (CVPR), 2015

Deep Face Dreams

Representative Image Neuron Inversion

Deep Face Dreams

Representative Image Neuron Inversion

Questions?

Unconstrained Face Recognition: Deep Learning...

Documents

Transcript of Unconstrained Face Recognition: Deep Learning...

Deep Learning for Face Recognition - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang...• DeepFace developed by Facebook also at CVPR’14 used 73-point 3D face

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

LFW Social Media Case Study

Abstract arXiv:1906.02858v1 [cs.CV] 7 Jun 2019difﬁcult variant LFW-BLUFR, testify that face completion is able to partially restore face perception in machine vision systems for

DAS-FACE · 2018. 7. 12. · 3] Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. “Deepface: closing the gap to human-level performance in face verification”. Proc. Conference on

BMFB 4283 LFW - 5a

LFW Portfolio Assignment

Supplementary Material: DeepFace: Closing the Gap to Human … · 2014-04-21 · DeepFace: Closing the Gap to Human-Level Performance in Face Veriﬁcation Yaniv Taigman Ming Yang

Chaochao Lu Xiaoou Tang - arXiv · Surpassing Human-Level Face Veriﬁcation Performance on LFW with GaussianFace Chaochao Lu Xiaoou Tang Department of Information Engineering, The

Deep Geolocalization and Siamese Nets · • Taigman, Yang, Ranzato, Wolf, DeepFace: Closing the Gap to Human-Level Performance in Face Verification (CVPR 2014) • Schroff, Kalenichenko,

LFW Showcase Pack AW13

LFW vs. PFW

LFW proposal E15

LFW - Vivienne Westwood Red Label

R8 LFW CWN Contract

LFW AW16 TREND REPORT -

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face … · 2019. 6. 10. · of recent studies, such as DeepFace [36], DeepID2 [31], DeepID3 [32], VGGFace [25]

LFW - TopShop Unique

Face Recognition: from EigenFaces to DeepFace - AI … · Face Recognition: from EigenFaces to DeepFace ... A. Pentland. Face Recognition using EigenFaces // Journal of cognitive

Face Recognition with Deep Learninghji/cs519_slides/Face Recognition with Deep... · Face Recognition with Deep Learning. Outline 1. Introduction 2. Related works 3. DeepFace 4. ...