3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
-
Upload
taleb-alashkar -
Category
Engineering
-
view
530 -
download
0
Transcript of 3D Dynamic Facial Sequences Analsysis for face recognition and emotion detection
3D DYNAMIC FACIAL SEQUENCES ANALYSIS
FOR FACE RECOGNITION AND EMOTION
DETECTION
PhD Candidate: Taleb ALASHKAR
Supervisor: Prof. Mohamed DAOUDI
Co-Supervisor: Dr. Boulbaba BEN AMOR 1
Taleb ALASHKAR PhD Defense 2-Nov-2015
WHY FACE ANALYSIS?
?
DB
ID: 15081986
Identity Recognition
Facial Expressions
29 25 22 12
Age Estimation
Physical State Monitoring
Fatigue Pain
Angry Surprised Happy
WHY 3D FACE?
Illumination
Pose
3D
2D
WHY 3D DYNAMIC?
3D static
3D Dynamic
vs
vs
Year
….
2 2
MOTIVATION AND CHALLENGES
Motivation to 4D (3D+t) Face Analysis
Robustness to illumination changes and pose variations
Availability of cost-effective (Kinect-like) and high
resolution (Di4D) 3D dynamic sensors
Richness in shape and deformation
Challenges of 4D Face Analysis
Noisy data (from acquisition and sensor accuracy)
Missing data (single-view scanners)
Volume of data (sequence of 3D meshes)
Low-resolution frames (Kinect-like sensors)
Compact spatio-temporal representation robust to
noise and missing data wich allows 4D face analysis
3
THESIS CONTRIBUTION
Input
3D Sequences
Subspace
Representation
Trajectory
Representation
Dictionary
Representation
(I) 4D Face Recognition (II) 4D Spontaneous Emotion Detection
Matrix manifold
w
Applications 4
OUTLINE
4D Face Recognition
State of the art
4D face recognition framework
Experiments and results
4D Spontaneous Emotion Detection
State of the art
Trajectories on Grassmann manifold
Spontaneous emotional state detection from depth video
Spontaneous pain detection from 4D high resolution video
Conclusion and Future Work
5
OUTLINE
4D Face Recognition
State of the art
4D face recognition framework
Experiments and results
4D Spontaneous Emotion Detection
State of the art
Trajectories on Grassmann manifold
Spontaneous emotional state detection from depth video
Spontaneous pain detection from 4D high resolution video
Conclusion and Future Work
6
FACE RECOGNITION FROM 4D DATA
State of the Art
Frame Set Super Resolution Spatio-Temporal
Low resolution (Kinect)
Illumination/FE
Temporal information
Complex enrollment
(Lie et al., 2013)
Low resolution (Kinect)
Constant expression
Temporal information
3D frames alignment
(Berretti et al., 2014) (Sun et al., 2010)
One Kinect
frame
3D HR
scanner 7 7
Outperforms 2D video/3D static
Space-time representation
Time consuming
Tracking/model adaptation/
conformal mapping/ST HMM
4D FACE RECOGNITION
4D Face Recognition Approach
Data
processing
Training
Test
time
Mo
de
lin
g
Identity
Subspace
Modeling (k-SVD)
Curvature-maps
Extraction
time Mean
Curvature
Computation
time
?
= Span{ , ,...., }
Dictionary
Cla
ssif
ica
tio
n
Grassmann
Dictionary
Learning
Grassmann
SRC Sparse
Coding
time
time
8
4D FACE RECOGNITION
Where K1 and K2 is the two principle curvatures at each vertex.
Spatial Feature Extraction
Capture the local facial shape
Invariant to the scale, rotation and mesh resolution
Ability to capture the non-rigid facial deformation 9
4D FACE RECOGNITION
Spatio-Temporal Subspace Representation
n×m
k-SVD
3D dynamic original Data
Matrix manifold
Curvature Map
Reshape
Subspace
Representation
k < m
Compact low dimensional representation
Robust against noise and missing data
Availability of geometric statistical tools
Why subspace representation?
1 2 ………... k …. m 10
4D FACE RECOGNITION
Matrix Manifolds1:
Stiefel manifold
All possible k-dimensional subspaces n-dimensional space.
Defined distance on Stiefel given by:
Grassmann manifold
It is a quotient space of the Stiefel manifold with an equivalence
constraint:
X=Y if Span(X)=Span(Y), or
Exist orthonormal k×k matrix SO(k) w.r.t X=Y*SO(k)
[1]. P.A Absil et al,. “Optimization algorithms on matrix manifolds”, 2008.
11
4D FACE RECOGNITION
Grassmann Manifold Geometry
Non-linear manifold
A tangent space can be defined at any point on the manifold.
Algorithmic tools to compute the Log and Exp maps functions.
Distances on Grassmann1
Canonical Correlation/Principle Angles1
Geodesic Distance:
[1]. Hamm et al., ICML, 2008.
12
4D FACE RECOGNITION
Statistical Analysis on Grassmann manifold:
p1 p2
Log
Exp
v1
v2 Tµ µ
v3
p3
v
Intrinsic methods
Grassmann Nearest Neighbor (GNN) Classification
Training:
1. Compute karcher mean1 for every subject
in the training data (Gallery).
Testing:
2. Compare the probe with the mean of each
class using one defined distance on
Grassmann.
3. The closest mean to the probe gives the
targeted subject.
[1]. H. Karcher. PAAM, 1977.
13 13
4D FACE RECOGNITION
Statistical Analysis on Grassmann manifold:
14
Extrinsic methods1
Grassmann manifold embedding into linear space
Less computational time than Intrinsic
Projection mapping
[1]. Harandi et al., CVPR, 2013
14
4D FACE RECOGNITION
Sparse Coding and Dictionary Learning
Suitable for data with sparse structure
Learning over-complete rich dictionary
Robustness against noise and missing data
Efficient Sparse Representation Classifier (SRC)1
[1]. Wright et al. ,PAMI, 2009
15
4D FACE RECOGNITION
Experimental Results
Database
Bu4DFE Database1
101 subjects / 6 expressions (sequence) per subject
About 100 frames per sequence
Experimental Protocol (Sun et al, 2010)
60 subjects / sub-sequence of size w=6 / shifting step 3
Expression Dependent (ED): ½ of the expression training , ½ testing
Expression Independent (EI): 1 expression training, 5 testing
[1]. Yin et al. , FG, 2008
16
I. Grassmann Nearest Neighbor (GNN) classifier (w=6)
ED performance are better
than EI results
GNN is based on the mean
for each class (statistical
summary).
4D FACE RECOGNITION
Experimental Results
II. Grassmann Sparse Representation (GSR) classifier (w=6, EI)
Consider the face dynamics improves the recognition performance
in Expression Independent Dictionary representation
3.1%
17
4D FACE RECOGNITION
18
GSR > GGDA (variant of the GDA)
GSR < Sun et al. (10%) but
- it is computationally much less expensive
- Landmarks free
Experimental Results
III. Grassmann sparse representation (GSR) classifier
GGDA is a variant of Grassmann Discriminant Analysis (proposed in [1]. Harandi et al. , CVPR, 2011.
ST-HMM is the 4D FR approach propose in [2]. Sun et al., IEEE T-Cybernetics, 2010.
Robustness to the temporal
evolution (neutral-apex or
apex-neutral)
1
2
Expression Dependent Expression Independent
18
1
4D FACE RECOGNITION
Expression Independent
Training by 1 vs. training by 5 expressions
- Richness of the dictionary learned
- The sparse representation (code) of a new observation can be
covered efficiently from available atoms (except for surprise)
Experimental Results
IV. Grassmann Sparse Representation (GSR) classifier
9.2%
19
OUTLINE
4D Face Recognition
State of the art
4D face recognition framework
Experiments and results
4D Spontaneous Emotion Detection
State of the art
Trajectories on Grassmann manifold
Spontaneous emotional state detection from depth video
Spontaneous pain detection from 4D high resolution video
Conclusion and Future Work
20
4D SPONTANEOUS EMOTION DETECTION
Objectives
Proposing early detection framework for
spontaneous emotion from 3D dynamic
sequences in a continuous emotions space.
Challenges:
Spontaneous emotion of interest detection
Early emotion detection as soon as possible
3D (depth/high resolution) video
Arousal-valence chart
21
4D SPONTANEOUS EMOTION DETECTION
22
3D Facial Deformation 3D Feature Tracking
State of the Art
Global deformation
Subtle changes
Nose tip
Acted FE
(Ben Amor et al., 2014)
Non-Rigid
Deformation
Parameterization facial
Deformation
Local Spatial
Feature Tracking
Landmarks
Tracking
Global deformation
Automatic
Time consuming
Acted FE
(Sandbach et al., 2011)
Robust to Noise
Real time
landmarks tracking
Acted FE
(Berretti et al., 2012)
Robust to noise
Fast performance
landmarks tracking
Acted FE
(Xue et al., 2015) 22
4D SPONTANEOUS EMOTION DETECTION
Trajectories analysis on matrix manifold approach
Dividing the 3D video into subsequences
Subspace representation for each subsequence
Time parameterized trajectories on Matrix manifold
Temporal evolution through trajectory is computed
SO-SVM early event classifier applied
23
4D SPONTANEOUS EMOTION DETECTION
Spontaneous emotion detection from depth video
- Upper part of the body vs. the face only
Face vs. Face+ Upper Part
- Depth vs. 2D video data
24
4D SPONTANEOUS EMOTION DETECTION
Spontaneous emotion detection from depth video
Depth video representation as
trajectory on Grassmann.
Geodesic distances between
successive subspaces of the
trajectory is computed.
Geometric Motion History (GMH)
gives the temporal evolution of the
depth video.
SO-SVM1 early event detection is
applied on the GMHs signals.
[1]. Hoai et de la Torre. IJCV ,2014
……....
25
4D SPONTANEOUS EMOTION DETECTION
Experimental Analysis
Database:
The experiments are conducted on
Cam3D Kinect database1 which
contains depth videos for spontaneous
emotions
Protocol:
Two emotions will be detected
(Happiness vs. others and
Thinking/Unsure vs. others).
Targeted videos will be divided into two
halves, for training and testing.
Each emotion of interest will be
concatenated with two different others
randomly to have 100 samples for training
and testing.
[1]. Mahmoud et al., ACII,2011
26
4D SPONTANEOUS EMOTION DETECTION
Evaluation Criteria
True Positive (TP) Rate: is the fraction of
time series that the detector fires during the
event of interest.
False Positive (FP) Rate: is the fraction
of time series that the detector fires before
the event of interest starts
I. ROC Curve: is the function of TPR
against FPR by varying the detection
threshold. Area Under ROC Curve
(AUC)
II. AMOC curve: is to evaluate the
timeliness of detection.
27
4D SPONTANEOUS EMOTION DETECTION
Experimental Analysis
I. Grassmann vs. Stiefel manifold
Happiness detection
experiment
Thinking/unsure detection
experiment
28
4D SPONTANEOUS EMOTION DETECTION
Physical pain detection from high resolution 4D-faces
Spontaneous pain detection out of facial expressions.
3D dynamic high resolution data is available.
Early detection of pain using SO-SVM framework
30
4D SPONTANEOUS EMOTION DETECTION
Physical pain detection from high resolution 4D-faces
Depth-based Grassmann trajectories
Trajectory representation of the 3D video.
Velocity vectors computed between
successive subspaces.
Local Deformation Histogram (LDH) is
computed.
LDHs is concatenated.
The beginning and the end of the pain is
defined.
SO-SVM early detection.
1st Component of
the velocity
31
4D SPONTANEOUS EMOTION DETECTION
Physical pain detection from high resolution 4D-faces
3D landmark-based Grassmann trajectories (Baseline)
3D physical pain video is divided into
subsequences.
Facial landmarks coordinates (x,y,z) is
used as facial descriptor (83)
Every subsequence is represented as
subspace.
Geodesic distance is quantified
between successive subspaces to build
the GMH.
The beginning and the end of the pain
is defined.
SO-SVM early detection.
32
4D SPONTANEOUS EMOTION DETECTION
Experimental Analysis
Database:
BP4D-Spontaneous database1
41 subjects/ 8 Tasks
AUs annotation
Protocol:
28 physical pain videos are used.
14 for training and 14 for testing.
2-cross validation is applied.
The beginning and the end of pain is
determined according to action units
activation formula.
1,2
[1]. Zhang et al., IVCJ, 2014
[2] Prkachin et al., Pain, 2008.
AU4: Brow Lowering
AU6: Cheek raising
AU7: Tightening of eyelids
AU9: Wrinkling of nose
AU10: Raising of upper lip
33
4D SPONTANEOUS EMOTION DETECTION
Experimental Analysis
I. Effect of the smoothing and pose normalization
AUC=0.75
AUC=0.70
AUC=0.63
AUC=0.78
AUC=0.74
AUC=0.70
Geod
esi
c d
ista
nce
/Norm
of
the v
elo
city
- Increasing the derivation step
improves the results
- Normalizing the head pose improves
the results
34
4D SPONTANEOUS EMOTION DETECTION
Experimental Analysis
II. Landmarks vs. Depth method
AUC=0.80
AUC=0.78
35
4D SPONTANEOUS EMOTION DETECTION
Experimental Analysis
III. Local Deformation Histogram (LDH) vs. Distances
- - - Distances
- - - LDH
AUC=0.84
AUC=0.80
- - - Distances
- - - LDH
36
OUTLINE
4D Face Recognition
State of the art
4D face recognition framework
Experiments and results
4D Spontaneous Emotion Detection
State of the art
Trajectories on Grassmann manifold
Spontaneous emotional state detection from depth video
Spontaneous pain detection from 4D high resolution video
Conclusion and Future Work
37
CONCLUSION AND FUTURE WORK Conclusions
Common geometric framework with two different representations
4D Face Recognition Efficient subspace representation for 4D data
Exploiting the shape and its dynamic improves the results
Enrich the dictionary improves the results
4D Emotion Detection Modeling 3D video as time-parameterized curves (Trajectories) on
Grassmann manifold.
Upper part of the body outperforms the face alone.
Local approach (LDH) outperforms global approach (Distances)
Coupling geometric features (velocities) with advanced ML techniques (Early-event detector)
38
CONCLUSION AND FUTURE WORK Limitations
Lack of frame-to-frame vertex-level dense correspondence
Not considering the texture channel (available)
Limited number of subjects in the DB/lack of spontaneous DB
Perspectives and Future Work
Dense non-rigid registration/tracking
Investigating high order derivatives along the trajectories
39
PUBLICATION LIST
Submitted Journal
1. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti , “Analyzing Trajectories on
Grassmann Manifolds for Spontaneous Emotion Detection ”, Submitted to IEEE
Transaction on Affective Computing, Sep-2015.
2. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti “Modeling Shape Dynamics on
Grassmann Manifolds for 4D Face Recognition”, In preparation.
International Conferences and Workshops
1. T. Alashkar, B. Ben Amor, S. Berretti and M. Daoudi, “Analyzing Trajectories on
Grassmann Manifold for Early Emotion Detection from Depth Videos” in FG
2015.
2. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti, “A 3D Dynamic Database for
Unconstrained Face Recognition ” in 3D Body Scanning Technology International
Conference 2014.
3. T. Alashkar, B. Ben Amor, M. Daoudi and S. Berretti, “A Grassmannian Framework
for Face Recognition of 3D Dynamic Sequences with Challenging Conditions ”
in Springer NORDIA Workshop in ECCV 2014.
National Conference
1. T. Alashkar, B. Ben Amor, S. Berretti and M. Daoudi, “Analyse des trajectoires sur
une Grassmannienne pour la détection d’émotions dans des vidéos de
profondeur” in ORASIS 2015.
40