3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection

3D DYNAMIC FACIAL SEQUENCES ANALYSIS

FOR FACE RECOGNITION AND EMOTION

DETECTION

PhD Candidate: Taleb ALASHKAR

Supervisor: Prof. Mohamed DAOUDI

Co-Supervisor: Dr. Boulbaba BEN AMOR 1

Taleb ALASHKAR PhD Defense 2-Nov-2015

WHY FACE ANALYSIS?

?

DB

ID: 15081986

Identity Recognition

Facial Expressions

29 25 22 12

Age Estimation

Physical State Monitoring

Fatigue Pain

Angry Surprised Happy

WHY 3D FACE?

Illumination

Pose

3D

2D

WHY 3D DYNAMIC?

3D static

3D Dynamic

vs

vs

Year

….

2 2

MOTIVATION AND CHALLENGES

Motivation to 4D (3D+t) Face Analysis

Robustness to illumination changes and pose variations

Availability of cost-effective (Kinect-like) and high

resolution (Di4D) 3D dynamic sensors

Richness in shape and deformation

Challenges of 4D Face Analysis

Noisy data (from acquisition and sensor accuracy)

Missing data (single-view scanners)

Volume of data (sequence of 3D meshes)

Low-resolution frames (Kinect-like sensors)

Compact spatio-temporal representation robust to

noise and missing data wich allows 4D face analysis

3

THESIS CONTRIBUTION

Input

3D Sequences

Subspace

Representation

Trajectory

Representation

Dictionary

Representation

(I) 4D Face Recognition (II) 4D Spontaneous Emotion Detection

Matrix manifold

w

Applications 4

OUTLINE

4D Face Recognition

State of the art

4D face recognition framework

Experiments and results

4D Spontaneous Emotion Detection

State of the art

Trajectories on Grassmann manifold

Spontaneous emotional state detection from depth video

Spontaneous pain detection from 4D high resolution video

Conclusion and Future Work

5

OUTLINE

4D Face Recognition

State of the art




State of the art





6

FACE RECOGNITION FROM 4D DATA

State of the Art

Frame Set Super Resolution Spatio-Temporal

Low resolution (Kinect)

Illumination/FE

Temporal information

Complex enrollment

(Lie et al., 2013)

Low resolution (Kinect)

Constant expression

Temporal information

3D frames alignment

(Berretti et al., 2014) (Sun et al., 2010)

One Kinect

frame

3D HR

scanner 7 7

Outperforms 2D video/3D static

Space-time representation

Time consuming

Tracking/model adaptation/

conformal mapping/ST HMM

4D FACE RECOGNITION

4D Face Recognition Approach

Data

processing

Training

Test

time

Mo

de

lin

g

Identity

Subspace

Modeling (k-SVD)

Curvature-maps

Extraction

time Mean

Curvature

Computation

time

?

= Span{ , ,...., }

Dictionary

Cla

ssif

ica

tio

n

Grassmann

Dictionary

Learning

Grassmann

SRC Sparse

Coding

time

time

8

4D FACE RECOGNITION

Where K1 and K2 is the two principle curvatures at each vertex.

Spatial Feature Extraction

Capture the local facial shape

Invariant to the scale, rotation and mesh resolution

Ability to capture the non-rigid facial deformation 9

4D FACE RECOGNITION

Spatio-Temporal Subspace Representation

n×m

k-SVD

3D dynamic original Data

Matrix manifold

Curvature Map

Reshape

Subspace

Representation

k < m

Compact low dimensional representation

Robust against noise and missing data

Availability of geometric statistical tools

Why subspace representation?

1 2 ………... k …. m 10

4D FACE RECOGNITION

Matrix Manifolds1:

Stiefel manifold

All possible k-dimensional subspaces n-dimensional space.

Defined distance on Stiefel given by:

Grassmann manifold

It is a quotient space of the Stiefel manifold with an equivalence

constraint:

X=Y if Span(X)=Span(Y), or

Exist orthonormal k×k matrix SO(k) w.r.t X=Y*SO(k)

[1]. P.A Absil et al,. “Optimization algorithms on matrix manifolds”, 2008.

11

4D FACE RECOGNITION

Grassmann Manifold Geometry

Non-linear manifold

A tangent space can be defined at any point on the manifold.

Algorithmic tools to compute the Log and Exp maps functions.

Distances on Grassmann1

Canonical Correlation/Principle Angles1

Geodesic Distance:

[1]. Hamm et al., ICML, 2008.

12

4D FACE RECOGNITION

Statistical Analysis on Grassmann manifold:

p1 p2

Log

Exp

v1

v2 Tµ µ

v3

p3

v

Intrinsic methods

Grassmann Nearest Neighbor (GNN) Classification

Training:

1. Compute karcher mean1 for every subject

in the training data (Gallery).

Testing:

2. Compare the probe with the mean of each

class using one defined distance on

Grassmann.

3. The closest mean to the probe gives the

targeted subject.

[1]. H. Karcher. PAAM, 1977.

13 13

4D FACE RECOGNITION

Statistical Analysis on Grassmann manifold:

14

Extrinsic methods1

Grassmann manifold embedding into linear space

Less computational time than Intrinsic

Projection mapping

[1]. Harandi et al., CVPR, 2013

14

4D FACE RECOGNITION

Sparse Coding and Dictionary Learning

Suitable for data with sparse structure

Learning over-complete rich dictionary

Robustness against noise and missing data

Efficient Sparse Representation Classifier (SRC)1

[1]. Wright et al. ,PAMI, 2009

15

4D FACE RECOGNITION

Experimental Results

Database

Bu4DFE Database1

101 subjects / 6 expressions (sequence) per subject

About 100 frames per sequence

Experimental Protocol (Sun et al, 2010)

60 subjects / sub-sequence of size w=6 / shifting step 3

Expression Dependent (ED): ½ of the expression training , ½ testing

Expression Independent (EI): 1 expression training, 5 testing

[1]. Yin et al. , FG, 2008

16

I. Grassmann Nearest Neighbor (GNN) classifier (w=6)

ED performance are better

than EI results

GNN is based on the mean

for each class (statistical

summary).

4D FACE RECOGNITION


II. Grassmann Sparse Representation (GSR) classifier (w=6, EI)

Consider the face dynamics improves the recognition performance

in Expression Independent Dictionary representation

3.1%

17

4D FACE RECOGNITION

18

GSR > GGDA (variant of the GDA)

GSR < Sun et al. (10%) but

- it is computationally much less expensive

- Landmarks free


III. Grassmann sparse representation (GSR) classifier

GGDA is a variant of Grassmann Discriminant Analysis (proposed in [1]. Harandi et al. , CVPR, 2011.

ST-HMM is the 4D FR approach propose in [2]. Sun et al., IEEE T-Cybernetics, 2010.

Robustness to the temporal

evolution (neutral-apex or

apex-neutral)

1

2

Expression Dependent Expression Independent

18

1

4D FACE RECOGNITION

Expression Independent

Training by 1 vs. training by 5 expressions

- Richness of the dictionary learned

- The sparse representation (code) of a new observation can be

covered efficiently from available atoms (except for surprise)


IV. Grassmann Sparse Representation (GSR) classifier

9.2%

19

OUTLINE

4D Face Recognition

State of the art




State of the art





20

4D SPONTANEOUS EMOTION DETECTION

Objectives

Proposing early detection framework for

spontaneous emotion from 3D dynamic

sequences in a continuous emotions space.

Challenges:

Spontaneous emotion of interest detection

Early emotion detection as soon as possible

3D (depth/high resolution) video

Arousal-valence chart

21


22

3D Facial Deformation 3D Feature Tracking

State of the Art

Global deformation

Subtle changes

Nose tip

Acted FE

(Ben Amor et al., 2014)

Non-Rigid

Deformation

Parameterization facial

Deformation

Local Spatial

Feature Tracking

Landmarks

Tracking

Global deformation

Automatic

Time consuming

Acted FE

(Sandbach et al., 2011)

Robust to Noise

Real time

landmarks tracking

Acted FE

(Berretti et al., 2012)

Robust to noise

Fast performance

landmarks tracking

Acted FE

(Xue et al., 2015) 22


Trajectories analysis on matrix manifold approach

Dividing the 3D video into subsequences

Subspace representation for each subsequence

Time parameterized trajectories on Matrix manifold

Temporal evolution through trajectory is computed

SO-SVM early event classifier applied

23


Spontaneous emotion detection from depth video

- Upper part of the body vs. the face only

Face vs. Face+ Upper Part

- Depth vs. 2D video data

24


Spontaneous emotion detection from depth video

Depth video representation as

trajectory on Grassmann.

Geodesic distances between

successive subspaces of the

trajectory is computed.

Geometric Motion History (GMH)

gives the temporal evolution of the

depth video.

SO-SVM1 early event detection is

applied on the GMHs signals.

[1]. Hoai et de la Torre. IJCV ,2014

……....

25


Experimental Analysis

Database:

The experiments are conducted on

Cam3D Kinect database1 which

contains depth videos for spontaneous

emotions

Protocol:

Two emotions will be detected

(Happiness vs. others and

Thinking/Unsure vs. others).

Targeted videos will be divided into two

halves, for training and testing.

Each emotion of interest will be

concatenated with two different others

randomly to have 100 samples for training

and testing.

[1]. Mahmoud et al., ACII,2011

26


Evaluation Criteria

True Positive (TP) Rate: is the fraction of

time series that the detector fires during the

event of interest.

False Positive (FP) Rate: is the fraction

of time series that the detector fires before

the event of interest starts

I. ROC Curve: is the function of TPR

against FPR by varying the detection

threshold. Area Under ROC Curve

(AUC)

II. AMOC curve: is to evaluate the

timeliness of detection.

27



I. Grassmann vs. Stiefel manifold

Happiness detection

experiment

Thinking/unsure detection

experiment

28



II. Upper part of the body vs. face alone

29


Physical pain detection from high resolution 4D-faces

Spontaneous pain detection out of facial expressions.

3D dynamic high resolution data is available.

Early detection of pain using SO-SVM framework

30



Depth-based Grassmann trajectories

Trajectory representation of the 3D video.

Velocity vectors computed between

successive subspaces.

Local Deformation Histogram (LDH) is

computed.

LDHs is concatenated.

The beginning and the end of the pain is

defined.

SO-SVM early detection.

1st Component of

the velocity

31



3D landmark-based Grassmann trajectories (Baseline)

3D physical pain video is divided into

subsequences.

Facial landmarks coordinates (x,y,z) is

used as facial descriptor (83)

Every subsequence is represented as

subspace.

Geodesic distance is quantified

between successive subspaces to build

the GMH.

The beginning and the end of the pain

is defined.

SO-SVM early detection.

32



Database:

BP4D-Spontaneous database1

41 subjects/ 8 Tasks

AUs annotation

Protocol:

28 physical pain videos are used.

14 for training and 14 for testing.

2-cross validation is applied.

The beginning and the end of pain is

determined according to action units

activation formula.

1,2

[1]. Zhang et al., IVCJ, 2014

[2] Prkachin et al., Pain, 2008.

AU4: Brow Lowering

AU6: Cheek raising

AU7: Tightening of eyelids

AU9: Wrinkling of nose

AU10: Raising of upper lip

33



I. Effect of the smoothing and pose normalization

AUC=0.75

AUC=0.70

AUC=0.63

AUC=0.78

AUC=0.74

AUC=0.70

Geod

esi

c d

ista

nce

/Norm

of

the v

elo

city

- Increasing the derivation step

improves the results

- Normalizing the head pose improves

the results

34



II. Landmarks vs. Depth method

AUC=0.80

AUC=0.78

35



III. Local Deformation Histogram (LDH) vs. Distances

- - - Distances

- - - LDH

AUC=0.84

AUC=0.80

- - - Distances

- - - LDH

36

OUTLINE

4D Face Recognition

State of the art




State of the art





37

CONCLUSION AND FUTURE WORK Conclusions

Common geometric framework with two different representations

4D Face Recognition Efficient subspace representation for 4D data

Exploiting the shape and its dynamic improves the results

Enrich the dictionary improves the results

4D Emotion Detection Modeling 3D video as time-parameterized curves (Trajectories) on

Grassmann manifold.

Upper part of the body outperforms the face alone.

Local approach (LDH) outperforms global approach (Distances)

Coupling geometric features (velocities) with advanced ML techniques (Early-event detector)

38

CONCLUSION AND FUTURE WORK Limitations

Lack of frame-to-frame vertex-level dense correspondence

Not considering the texture channel (available)

Limited number of subjects in the DB/lack of spontaneous DB

Perspectives and Future Work

Dense non-rigid registration/tracking

Investigating high order derivatives along the trajectories

39

PUBLICATION LIST

Submitted Journal

1. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti , “Analyzing Trajectories on

Grassmann Manifolds for Spontaneous Emotion Detection ”, Submitted to IEEE

Transaction on Affective Computing, Sep-2015.

2. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti “Modeling Shape Dynamics on

Grassmann Manifolds for 4D Face Recognition”, In preparation.

International Conferences and Workshops

1. T. Alashkar, B. Ben Amor, S. Berretti and M. Daoudi, “Analyzing Trajectories on

Grassmann Manifold for Early Emotion Detection from Depth Videos” in FG

2015.

2. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti, “A 3D Dynamic Database for

Unconstrained Face Recognition ” in 3D Body Scanning Technology International

Conference 2014.

3. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti, “A Grassmannian Framework

for Face Recognition of 3D Dynamic Sequences with Challenging Conditions ”

in Springer NORDIA Workshop in ECCV 2014.

National Conference

1. T. Alashkar, B. Ben Amor, S. Berretti and M. Daoudi, “Analyse des trajectoires sur

une Grassmannienne pour la détection d’émotions dans des vidéos de

profondeur” in ORASIS 2015.

40

Thank You and You are Welcome

41

Shawrma Taboula Kebba Yabrak

Syrian Sweets Namoura Mtabal Hommos

Salle: F024

3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection

Engineering

Transcript of 3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection