Is Your Machine Emotionally Intelligent?rtv4hci.rutgers.edu/04/talks/relkaliouby.pdf · RTV4HCI...
Transcript of Is Your Machine Emotionally Intelligent?rtv4hci.rutgers.edu/04/talks/relkaliouby.pdf · RTV4HCI...
Outline
Mindreading Machines
Rana El Kaliouby and Peter Robinson
Contact: [email protected]
www.cl.cam.ac.uk/~re227
RTV4HCI 2004
RTV4HCI 2004
Mindreading Machines
Mindreading: is the ability to attribute mental states to others given non-verbal communication cues (e.g. Baron-Cohen et. Al, 1996)
Essential to understanding behaviour, predicting actions in a social environment
AutomatedMindreading
System
Video of the face
A mental state class
intention-aware, context-aware
technology
RTV4HCI 2004
More faces?!! What’s new?Mindreading Inferring Cognitive
Mental StatesApplicationsExperimental
EvaluationConclusions
Inspiring progress on automated facial expression analysis
Mostly on automated facial action analysisOr recognition of the set of basic emotions
Beyond the basic emotions:
complex (cognitive) mental states
RTV4HCI 2004
Complex Mental StatesMindreading Inferring Cognitive
Mental StatesApplicationsExperimental
EvaluationConclusions
• Agreement
(convinced, committed, encouraging, persuaded, agreeing, decided,
sure, willing)
• Concentrating(absorbed, concentrating)
• Disagreement(contradictory, disagreeing, discouraging, disapproving)
• Interested(interested, asking)
• Thinking(brooding, calculating, thinking, thoughtful, fantasizing, choosing)
• Unsure(confused, baffled, puzzled, unsure, undecided)
RTV4HCI 2004
Complex Mental States
Why?more frequent (Rozin and Cohen, Emotion 2003)
play an important role in predicting people’s intentions and actions than basic emotions (Baron-Cohen, 2001)
Do we need a different approach?(El Kaliouby, Robinson and Keates HCII 2003)
Multiple asynchronous information sourcesMulti-level temporal abstractions (real time at 29fps)
RTV4HCI 2004
The mindreading DVD
• 412 mental states (or emotions)• 6 videos per mental state• 24 groups
• meta-groups• fine shades of the same mental state
• Posed• Most comprehensive, labelled dataset available•Resolution: 320x240• duration 6-8sec
• Variables:• 20 different actors• varying head + body pose• doesn’t start with a neutral
• http://www.jkp.com/mindreading/
Mindreading DVD (Jessica Kingsley Publishers), courtesy of the Autism Research Centre, University of Cambridge
AutomatedMindreadingSystem
RTV4HCI 2004
Overview
Facial feature extraction
Feature point tracking
Head pose estimation
Head & facial action unit recognition
Head & facial display recognition
Mental state inference
Hmm … Let me think about this
RTV4HCI 2004
Head Pose Estimation
Use expression-invariant feature points to estimate pitch (50°), yaw(50°) and roll(30° ).Output is head action unit
RTV4HCI 2004
Facial Feature ExtractionMindreading Inferring Cognitive
Mental StatesApplicationsExperimental
EvaluationConclusions
Colour, shape and motion analysis (Tian et al, 2001)
Output is facial action units (+intensity)
Aperture Teeth
RTV4HCI 2004
Overview
Facial feature extraction
Feature point tracking
Head pose estimation
Head & facial action unit recognition
Head & facial display recognition
Mental state inference
Hmm … Let me think about this
RTV4HCI 2004
Facial and Head Display Recognition
N o s e Tip D is p lac e me nt s ( C o nv inc e d )
-10
-5
0
5
10
15
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 1
Fr a m e N o .
A head nod is an alternating series of head up and head down movements.
Lip corner pull
-20
0
20
40
60
80
1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190
frame no.
%in
crea
se in
dis
tanc
e
DIST(L)*RDIST(R)/RAV
Onset of a smile Smile Peak
RTV4HCI 2004
Facial and Head Display RecognitionMindreading Inferring Cognitive
Mental StatesApplicationsExperimental
EvaluationConclusions
Left-right HMMs 4 state, 3 symbols for nod, shake, mouth displays2 state, 7 symbol for tilt, turn displays
RTV4HCI 2004
Facial and Head Display Recognition
TrainingMaximum likelihood training30-50 samples for each head/facial display
RTV4HCI 2004
Facial and Head Display Recognition
Evaluation (12900 frames)• Live demo:
• Royal Institution of Great Britain (Feb 2004)• CVPR
RTV4HCI 2004
Complex Mental State InferenceMindreading Inferring Cognitive
Mental StatesApplicationsExperimental
EvaluationConclusions
Facial feature extraction
Feature point tracking
Head pose estimation
Head & facial action unit recognition
Head & facial display recognition
Mental state inference
Hmm … Let me think about this
RTV4HCI 2004
Complex Mental State Inference*
Mindreading Inferring Cognitive Mental States
ApplicationsExperimental Evaluation
Conclusions
Dynamic Bayesian Networks (DBNs)A model per mental state
Hidden mental state
Ct
Nod
Teeth
Nod
Teeth
Smile
Observeddisplays
Ct+1
Turn Smile Turn
*
RTV4HCI 2004
Complex Mental State Inference
Parameter learningMaximum likelihood estimatesAround 270 samples per classDiscriminative power of display d for each mental state m:
Interested Thinking
ConcentratingAgreement
Disagreement Unsure
RTV4HCI 2004
Complex Mental State Inference
Structure learningFor each mental state find optimal subset of displaysSequential backward elimination
RTV4HCI 2004
Complex Mental State Inference
Structure learning
RTV4HCI 2004
Complex Mental State Inference
InferenceJunction tree inferenceForward-backward algorithm
RTV4HCI 2004
Multi-level Temporal Abstraction
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Accumulating symbolsMental state (DBN)
Facial/Head display (HMM)
Facial/Head AU
Frame
RTV4HCI 2004
Tying it altogether: Multi-level Temporal Abstraction
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Accumulating symbolsMental state (DBN)
Facial/Head display (HMM)
Facial/Head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Accumulating symbolsMental state (DBN)
Facial/Head display (HMM)
Facial/Head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Accumulating symbolsMental state (DBN)
Facial/Head display (HMM)
Facial/Head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Accumulating symbolsMental state (DBN)
Facial/Head display (HMM)
Facial/Head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
Accumulating HMM evidence
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
Accumulating HMM evidence
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
Accumulating HMM evidence
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
Accumulating HMM evidence
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
Accumulating HMM evidence
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
DBN inference
Accumulating HMM evidence
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
HMM evidenceSliding window
DBN inference
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
DBN inference
HMM evidenceSliding window
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
DBN inference
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
HMM evidenceSliding window
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
HMM evidenceSliding window
DBN inference
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
HMM evidenceSliding window
DBN inference
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
Symbol sliding windowMental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
RTV4HCI 2004
Multi-level Temporal Abstraction
60 65 70 75 80 85 90 95 100 105
HMM evidenceSliding window
DBN inference
5 10 15 20 25 30 35 40 45 50 55
Symbol sliding window Mental state (DBN)
Facial/head display (HMM)
Facial/head AU
Frame
ExperimentalEvaluation
Mindreading DVD (Jessica Kingsley Publishers), courtesy of the Autism Research Centre, University of Cambridge
RTV4HCI 2004
Demo: test yourself!!
Mindreading Inferring Cognitive Mental States
ApplicationsExperimental Evaluation
Conclusions
Can you guess which group the video belongs to?AgreementConcentratingDisagreementInterestedThinkingUnsure
video
RTV4HCI 2004
Demo: test the mindreading machine!!
Objective: calculate a probability for each mental stateGiven: evidence nodesColour coding Notes:
previous states affect current onesLikelihoods do not add up to 100% (not mutually exclusive)
Demo
Opent
Unsuret43%
Unsuret-152%
Nodt
Smilet
Puckert
Thinkingt86%
Turnt
PuckertNodt
Thinkingt-182%
Shaket
Tiltt
0 10020 40 60 80
Likelihood legend(%)
RTV4HCI 2004
Demo: The results
Mindreading Inferring Cognitive Mental States
ApplicationsExperimental Evaluation
Conclusions
This was a video of choosing (thinking group)
-0.1
0.1
0.3
0.5
0.7
0.9
1.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Inference instance
Men
tal s
tate
like
lihoo
d AgreementDisagreementThinkingComprehendingConcentratingUnsureInterested
RTV4HCI 2004
Experimental ResultsMindreading Inferring Cognitive
Mental StatesApplicationsExperimental
EvaluationConclusions
Group Mental State #videos %CorrectAgreement Assertive 3 66.7
Committed 5 100Convinced 4 100Decided 4 50Encouraging 3 100Sure 4 100Willing 2 100Total 25 88.1
Concentrating Absorbed 4 100Concentrating 6 100Total 10 100
Disagreement Contradictory 3 100Disapproving 5 40Discouraging 5 100Total 13 80
Interested Asking 5 80Interested 5 100Total 10 90
Thinking Brooding 3 66.7Calculating 4 75Choosing 5 100Fantasising 4 100Thinking 2 100Total 18 88.9
Unsure Baffled 6 100Confused 6 83.3Puzzled 6 83.3Undecided 6 83.3Unsure 6 100Total 30 90
106 89.5Overall Recognition Rate
Leave-5-out cross validation106 videos (12900 frames)Out of the 24 groups, currently support six (24 mental states)Leaves another 12 complex groups
Priority ones: bored, comprehendingNot really interested in: flirtatious, sneaky!
Recognition rate: 89.5% of the videos were correctly classifiedFalse positive rate for class m (given by the percentage of files misclassified as m):
• highest for agreement (16%) • lowest for thinking (0%)
Interesting ResultsMental state overlap?onset
Open questions:How well does this generalise to spontaneous (unlabelled) datasets
RTV4HCI 2004
Temporal Smoothing vs. Sliding window size
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
HMM evidenceSliding window(size=10)
DBN inference
Symbol sliding window Mental state (DBN)The effect of the sliding window
size (sw) on temporal smoothing:
Large sw size: over-smoothingSmall size: spiky inferences
Facial/head display (HMM)
Facial/head symbol
Frame
RTV4HCI 2004
Conclusions Mindreading Inferring Cognitive
Mental StatesApplicationsExperimental
EvaluationConclusions
1. Mindreading machines: 1. Beyond the basic emotions!2. Fully automated3. Supports moderate out of plane head motion
2. We tried only one possible approach ...3. Applications
To do: Cluster analysis to measure similarity / distance between the classesTry the CVPR demo data
RTV4HCI 2004
BIG thank you :)
A BIG thank you to everyone who took part in our demo (especially those who did some acting)