Emotion Recognition and Cognitive
-
Upload
ramrekha-akshay -
Category
Documents
-
view
221 -
download
0
Transcript of Emotion Recognition and Cognitive
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 172
From imagination to impact
Julien Epps Fang Chen and Bo Yin
Using Information to Drive Decisions
Emotion Recognition and CognitiveLoad Measurement from Speech
NICTA Copyright 20101
fangchennictacomau
The material is mainly coming from the tutorial ofAPSIPA Annual Summit and Conference
December 14th 2010
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 272
Overview
bull Emotion and Mental State Recognition Systems
bull Emotions and Cognitive Loadndash What are they and how can they be measured
bull Feature Extraction
ndash
bull Normalisation Modelling and Classification
bull Speech Databases and their Design
bull Applications of Emotion Recognition and Cognitive Load
Measurement
bull Summary
2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 372
Emotion and Mental State
3
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 472
Emotion Recognition Some History
bull The Measurement of Emotion
ndash Whately-Smith 1922
bull Emotion in speech (science)ndash Scherer 1980s-present
bull Emotion in synthetic speech
ndash 1990sbull Emotion recognition
ndash Dellaert et al 1996 ndash prosodic features
bull Applications of emotion recognition
ndash Petrushin 1999 Lee et al 2001 ndash call centres
bull Affective computing
ndash Picard 2000
bull 2000 ~10 papers per year4
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672
Cognitive Load Some History
bull Working memory
ndash Miller et al 1960
ndash Wickens 1980s Baddeley 1990s onwards
bull Cognitive load in learning
ndash Sweller (mid 80s)
bull Working memory and languagendash Gathercole 1993
bull Cognitive load in speech (science)
ndash Berthold and Jameson 1999
bull Cognitive load classification from speech
ndash Yin et al 2008
6
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772
Emotion Recognition Key Components
bull Typical acoustic based system
speech featureextraction
featurenormalisation
voicingdetection
7
emotion
modelling
classification
regression
recognizedemotion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872
Emotion Recognition Different Approaches
bull Acoustic vs prosodic vs linguistic
bull Detailed spectral vs broad spectral features
bull Direct vs adapted models
bull Static (utterance) vs dynamic (framemulti-frame)
bull Categorical vs ordinal vs regression-basedclassification
bull Recognition vs detection
8
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 272
Overview
bull Emotion and Mental State Recognition Systems
bull Emotions and Cognitive Loadndash What are they and how can they be measured
bull Feature Extraction
ndash
bull Normalisation Modelling and Classification
bull Speech Databases and their Design
bull Applications of Emotion Recognition and Cognitive Load
Measurement
bull Summary
2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 372
Emotion and Mental State
3
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 472
Emotion Recognition Some History
bull The Measurement of Emotion
ndash Whately-Smith 1922
bull Emotion in speech (science)ndash Scherer 1980s-present
bull Emotion in synthetic speech
ndash 1990sbull Emotion recognition
ndash Dellaert et al 1996 ndash prosodic features
bull Applications of emotion recognition
ndash Petrushin 1999 Lee et al 2001 ndash call centres
bull Affective computing
ndash Picard 2000
bull 2000 ~10 papers per year4
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672
Cognitive Load Some History
bull Working memory
ndash Miller et al 1960
ndash Wickens 1980s Baddeley 1990s onwards
bull Cognitive load in learning
ndash Sweller (mid 80s)
bull Working memory and languagendash Gathercole 1993
bull Cognitive load in speech (science)
ndash Berthold and Jameson 1999
bull Cognitive load classification from speech
ndash Yin et al 2008
6
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772
Emotion Recognition Key Components
bull Typical acoustic based system
speech featureextraction
featurenormalisation
voicingdetection
7
emotion
modelling
classification
regression
recognizedemotion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872
Emotion Recognition Different Approaches
bull Acoustic vs prosodic vs linguistic
bull Detailed spectral vs broad spectral features
bull Direct vs adapted models
bull Static (utterance) vs dynamic (framemulti-frame)
bull Categorical vs ordinal vs regression-basedclassification
bull Recognition vs detection
8
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 372
Emotion and Mental State
3
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 472
Emotion Recognition Some History
bull The Measurement of Emotion
ndash Whately-Smith 1922
bull Emotion in speech (science)ndash Scherer 1980s-present
bull Emotion in synthetic speech
ndash 1990sbull Emotion recognition
ndash Dellaert et al 1996 ndash prosodic features
bull Applications of emotion recognition
ndash Petrushin 1999 Lee et al 2001 ndash call centres
bull Affective computing
ndash Picard 2000
bull 2000 ~10 papers per year4
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672
Cognitive Load Some History
bull Working memory
ndash Miller et al 1960
ndash Wickens 1980s Baddeley 1990s onwards
bull Cognitive load in learning
ndash Sweller (mid 80s)
bull Working memory and languagendash Gathercole 1993
bull Cognitive load in speech (science)
ndash Berthold and Jameson 1999
bull Cognitive load classification from speech
ndash Yin et al 2008
6
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772
Emotion Recognition Key Components
bull Typical acoustic based system
speech featureextraction
featurenormalisation
voicingdetection
7
emotion
modelling
classification
regression
recognizedemotion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872
Emotion Recognition Different Approaches
bull Acoustic vs prosodic vs linguistic
bull Detailed spectral vs broad spectral features
bull Direct vs adapted models
bull Static (utterance) vs dynamic (framemulti-frame)
bull Categorical vs ordinal vs regression-basedclassification
bull Recognition vs detection
8
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 472
Emotion Recognition Some History
bull The Measurement of Emotion
ndash Whately-Smith 1922
bull Emotion in speech (science)ndash Scherer 1980s-present
bull Emotion in synthetic speech
ndash 1990sbull Emotion recognition
ndash Dellaert et al 1996 ndash prosodic features
bull Applications of emotion recognition
ndash Petrushin 1999 Lee et al 2001 ndash call centres
bull Affective computing
ndash Picard 2000
bull 2000 ~10 papers per year4
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672
Cognitive Load Some History
bull Working memory
ndash Miller et al 1960
ndash Wickens 1980s Baddeley 1990s onwards
bull Cognitive load in learning
ndash Sweller (mid 80s)
bull Working memory and languagendash Gathercole 1993
bull Cognitive load in speech (science)
ndash Berthold and Jameson 1999
bull Cognitive load classification from speech
ndash Yin et al 2008
6
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772
Emotion Recognition Key Components
bull Typical acoustic based system
speech featureextraction
featurenormalisation
voicingdetection
7
emotion
modelling
classification
regression
recognizedemotion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872
Emotion Recognition Different Approaches
bull Acoustic vs prosodic vs linguistic
bull Detailed spectral vs broad spectral features
bull Direct vs adapted models
bull Static (utterance) vs dynamic (framemulti-frame)
bull Categorical vs ordinal vs regression-basedclassification
bull Recognition vs detection
8
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 572
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672
Cognitive Load Some History
bull Working memory
ndash Miller et al 1960
ndash Wickens 1980s Baddeley 1990s onwards
bull Cognitive load in learning
ndash Sweller (mid 80s)
bull Working memory and languagendash Gathercole 1993
bull Cognitive load in speech (science)
ndash Berthold and Jameson 1999
bull Cognitive load classification from speech
ndash Yin et al 2008
6
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772
Emotion Recognition Key Components
bull Typical acoustic based system
speech featureextraction
featurenormalisation
voicingdetection
7
emotion
modelling
classification
regression
recognizedemotion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872
Emotion Recognition Different Approaches
bull Acoustic vs prosodic vs linguistic
bull Detailed spectral vs broad spectral features
bull Direct vs adapted models
bull Static (utterance) vs dynamic (framemulti-frame)
bull Categorical vs ordinal vs regression-basedclassification
bull Recognition vs detection
8
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 672
Cognitive Load Some History
bull Working memory
ndash Miller et al 1960
ndash Wickens 1980s Baddeley 1990s onwards
bull Cognitive load in learning
ndash Sweller (mid 80s)
bull Working memory and languagendash Gathercole 1993
bull Cognitive load in speech (science)
ndash Berthold and Jameson 1999
bull Cognitive load classification from speech
ndash Yin et al 2008
6
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772
Emotion Recognition Key Components
bull Typical acoustic based system
speech featureextraction
featurenormalisation
voicingdetection
7
emotion
modelling
classification
regression
recognizedemotion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872
Emotion Recognition Different Approaches
bull Acoustic vs prosodic vs linguistic
bull Detailed spectral vs broad spectral features
bull Direct vs adapted models
bull Static (utterance) vs dynamic (framemulti-frame)
bull Categorical vs ordinal vs regression-basedclassification
bull Recognition vs detection
8
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 772
Emotion Recognition Key Components
bull Typical acoustic based system
speech featureextraction
featurenormalisation
voicingdetection
7
emotion
modelling
classification
regression
recognizedemotion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872
Emotion Recognition Different Approaches
bull Acoustic vs prosodic vs linguistic
bull Detailed spectral vs broad spectral features
bull Direct vs adapted models
bull Static (utterance) vs dynamic (framemulti-frame)
bull Categorical vs ordinal vs regression-basedclassification
bull Recognition vs detection
8
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 872
Emotion Recognition Different Approaches
bull Acoustic vs prosodic vs linguistic
bull Detailed spectral vs broad spectral features
bull Direct vs adapted models
bull Static (utterance) vs dynamic (framemulti-frame)
bull Categorical vs ordinal vs regression-basedclassification
bull Recognition vs detection
8
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 972
Emotion Recognition Key Current Challenges
bull Basis for emotion mental states in physiology and
hence speech productionndash Shared with expressive speech synthesis
ndash Difficult to reverse-engineer spoken emotion
ndash Use of wide range of physiological sensors may be helpful
bull EGG glottal cameras vocal tract x-ray movies etcbull EEG GSR HRV respiration
ndash Complex emotion dependency varies between different contexts
(eg some words more emotive than others)
9
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1072
Emotion Recognition Key Current Challenges
bull Characterizing emotion mental states using robust
featuresndash Lots of features in literature
ndash Why should a given feature be not be useful
ndash Are some features more sensitive to some emotionsmental states
an w y
ndash How should temporal information best be used
10
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1172
Emotion Recognition Key Current Challenges
bull Recognition of emotion in naturally occurring
speechndash Much initial work on emotion recognition done on acted speech
ndash Initial work on cognitive load classification done on carefully
controlled experimental data
ndash ea emot ons meta states ne acte or exper menta ata
ndash Taking emotion recognition from the lab to the field
11
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1272
Emotion Recognition Key Current Challenges
bull Understanding application areasndash Lots of vision statements about affective computing
bull Detail still coming
ndash Understanding likely limits of automatic approaches
bull Match approaches with suitable applications
ndash
range of different situations
ndash New applications still emerging
12
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1372
13
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1472
Emotions and CL Psychology
bull Psychology has much to say about emotionsndash Origins of emotion are in survival and evolution
bull Also a signalling system between people
bull Important in learning behaviours
ndash Results of studies are complex to interpret as an engineer
lsquo rsquondash
ndash Cultural variation exists
ndash Emotions can often be discretely described only as lsquofull-blownrsquo
emotions
bull Cowie et al 2001
bull Pragmaticsndash Emotions of interest are dictated by application and availability of
labelled data usually single-culture often fully blown
14
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1572
Emotions and CL Psychology
bull Emotion representations and structuresndash Affective circumplex
arousal
valence
excited
happy
sad
nervous
tense
ndash Arousal alertnessresponsiveness to stimuli
ndash Valence positive (eg happy) and negative (anger fear) emotionsbull Emotions are high-dimensional
bull Reducing to two dimensions can lead to different structures
ndash Cowie et al 2001
15
depressed calm
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1672
Emotions and CL Psychology
bull One choice FeelTracendash For real time
ndash Cowie et al 2000
16
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1772
Emotions and CL Measuring Emotion
bull Measuring emotionsndash Self-reports of subjective experience
bull Single-item (ldquohow happy did you feelrdquo)bull Check-list eg mood adjective check-list
ndash Real time methods
-
bull eg arousal-affect emotion space cursor
ndash Cued review
bull Subjects watch video of themselves performing the task rate emotions
experienced post-hoc
ndash Observer ratings
ndash Behavioural measures (this presentation)
ndash Physiological measures (more later)
17
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1872
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash NASA Task Load Index (Hart and Staveland 1988)
18source httphumansystemsarcnasagovgroupsTLXdownloadsTLXScalepdf
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 1972
Emotions and CL Measuring CL
bull Measuring CLndash Subjective rating scales
ndash Subjective workload assessment technique(Reid and Nygren 1988)
ndash Mental Effort Load
raquo a Very little conscious mental effort or
concentration re uired Activit is almost
19
automatic requiring little or no attention
raquo b Moderate conscious mental effort or
concentration required Complexity of
activity is moderately high due to
uncertainty unpredictability or
unfamiliarity Considerable attention
requiredraquo c Extensive mental effort and
concentration are necessary Very
complex activity requiring total attention
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2072
Emotions and CL Measuring CL
bull Measuring other mental states (speech literature)ndash Stress
ndash Uncertainty
ndash Level of interest
ndash Deceptive speech
ndash enta sor ers
bull Depression anxiety autism
ndash Behaviour (eg drunkenness)
ndash Disfluency (indicative of some mental states)
ndash Note differences in temporal occurrence of mental state
20
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2172
21
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2272
Feature Extraction Origins
bull Effect of emotion on speech productionndash Scherer 1989
22
Sympathetic NS
(lsquofight or flightrsquo)
Autonomic NS
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2372
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Larsen and Fredrickson 2003
ndash Arousal component of emotion better conveyed using vocal cues
than the affective component
ndash steners can cons stent y nom nate voca cues or st ngu s ng
emotions
ndash Specific acoustic cues not generally linked to any particular emotions
bull Scherer 1986
23
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2472
Feature Extraction Origins
bull Effect of emotionmental state on speech
productionndash Typical general acoustic feature set (Larsen and Fredrickson 2003)
bull Fundamental frequency F 0
bull Fluctuation in F 0 due to jitter (∆F 0) and shimmer (∆ AF 0)
bull ner y
bull Speech rate
ndash Stress related to (Streeter et al 1983)
bull Increased energy
bull Increased pitch
24
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2572
Feature Extraction Methods
bull Voicing activity detectionndash Emotion mental state content primarily conveyed during voiced
portion of speechndash Remove unvoicedsilence
bull Energy based methods
bull -
bull VADs borrowed from other speech processing applicationsndash Virtually nil additional attention
bull Effect of cognitive load on speech productionndash Speech rate
ndash Energy contour
ndash F 0
ndash Spectral parameters
bull Scherer et al 200225
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2672
Feature Extraction Methods
bull Energy
02 04 06 08 1 12 14 16 18 2
x 104
-1
0
1
2
x 104
neutral speech
26
02 04 06 08 1 12 14 16 18 2
x 104
-2
0
2
x 104
10 20 30 40 50 60 70 80 90
10
15
10 20 30 40 50 60 70 80 90
10
15
20
angry speech
log energy
log energy
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2772
Feature Extraction Methods
bull Spectral featuresndash Tilt
bull Many systemsbull lsquoBroadrsquo spectral measure
ndash Weighted frequency
lsquo rsquo
ndash Mel Frequency Cepstral Coefficients (MFCC)bull Detailed spectral measure
ndash Log frequency power coefficients (LFPC) (Nwe et al 2003)
bull Detailed spectral measure
ndash Group delay from all pole spectrum
bull Captures bandwidth information (Sethu et al 2007)
ndash Formant frequencies
ndash RecentRecentRecentRecent Spectral Spectral Spectral Spectral CentroidCentroidCentroidCentroid featuresfeaturesfeaturesfeatures
27
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2872
Feature Extraction Methods
bull Glottal featuresndash Iterative Adaptive Inverse Filtering (IAIF) popular method for
estimating glottal flow from speech (Alku 1991)
ndash Glottal formants F g F c from glottal flow spectrum
bull Sethu 2009
ndash Primary open quotient (OQ1) normalised amplitude quotient (NAQ)
pr mary spee quo en
bull Airas and Alku 2006 Yap et al 2010
ndash Voice quality parameters
bull Lugger and Yang 2007
ndash Glottal statistics
bull Iliev and Scordilis 2008
ndash Vocal fold opening
bull Murphy and Laukkanen 2010
ndash Glottal flow spectrum (depressed speech)bull Ozdas et al 2004 others 28
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 2972
Feature Extraction Methods
bull Glottal features (example)ndash Yap et al 2010
ndash High load harr creaky voice quality
29
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3072
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)
Sethu
2009
30
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3172
Feature Extraction Empirical Comparison
bull Top individual features (12 actors 14 emotions)ndash Scherer 1996
ndash Pitch
bull Mean standard deviation 25th and 75th percentile
ndash Energy ndash mean
ndash Speech rate
ndash Long-term voiced average spectrum
bull 125-200 200-300 500-600 1000-1600 5000-8000 Hz
ndash Hammarberg index
bull Difference in energy maxima in 0-2 and 2-5 kHz bands (voiced)ndash Spectral slope above 1 kHz
ndash Voicing energy up to 1 kHz
ndash Long-term unvoiced average spectrum
bull 125-200 5000-8000 Hz31
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3272
Feature Extraction Empirical Comparison
bull Individual features (LDC Emotional Prosody 5-class)ndash Sethu 2009
32
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3372
Feature Extraction Temporal Information
bull Shifted delta coefficientsndash Feature C l (l = feature dimension) at time t
ndash k how many delta features
ndash spac ng etween eatures
ndash d controls span of each delta
ndash Important in cognitive load (Yin et al 2008)
bull MFCC 522
bull MFCC+Prosodic (Concatenated) 593
bull MFCC+Prosodic Acceleration 644
bull MFCC+Prosodic SDC 657
bull Much larger performance difference in other CL research ~20-60 impr
33
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3472
Feature Extraction Temporal Information
bull Parameter Contoursndash Pitch Sethu 2009
bull 5-class LDC Emotional Prosody
34
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3572
Feature Extraction Empirical Comparison
bull Comparison of Cognitive Load Featuresndash 3-class Stroop Test Database
FeatureFeatureFeatureFeature (combination)(combination)(combination)(combination) AccuracyAccuracyAccuracyAccuracy
MFCC 522
35
MFCC + SDC 602
MFCC + Pitch + Energy + SDC 792
MFCC + Pitch + Energy + 3 Glottal features + SDC 844
F1 + F2 + F3 + SDC 677
SCF + SCA + SDC 872
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3672
Feature Extraction Prosodic
bull Emotionndash Duration rate pause pitch energy features
bull Based on ASR alignmentbull Ang et al 2002
ndash Pitch level and range speech tempo loudness
bull Emotional speech synthesisbull Schroumlder 2001
bull Cognitive loadndash Duration
bull Speech is slower under higher cognitive load
bull Pitch formant trajectories less variable (more monotone)
bull Yap et al 2009
36
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3772
Feature Extraction Linguistic
bull Emotionndash Fused linguistic explored
in a few papersndash eg 10-30 impr
Lee et al 2002
ndash g t c er et a
37
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3872
Feature Extraction Linguistic
bull Higher cognitive loadndash Sentence fragments false starts and errors increase
ndash Onset latency pauses increasendash Articulationspeech rate decrease
bull Berthold and Jameson 1999
ndash easures o an ua e comp ex y ndash comp ex y ncreases or mu p e
measuresndash More use of plural pronouns less of singular
bull Khawaja et al 2009
ndash Linguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to dateLinguistic not fused with acoustic to date
38
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 3972
Speech Cues Related to Cognitive Load (CL)
bull Disfluenciesndash Interruption rate
ndash Proportion of the effective speech in the whole speech period
ndash Keywords for correction or repeating bull Inter-sentential pausing
ndash Length and frequency of big pauses
39
ndash Length and frequency of small pausesndash Length of intra-sentence segments
bull Slower speech ratendash Syllable rate
bull Response Latencyndash Delay of generating speech
ndash Particular hybrid prosodic pattern
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4072
Classification
40
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4172
Sources of Variability in Emotion Features
bull Two main sources of variabilityndash Phonetic
ndash Speaker identity
bull Both stronger sources of variability
bull
ndash Model itbull eg UBM-GMM with many mixtures
ndash Remove it
bull eg normalisation
bull Other variabilityndash Phoneticlinguistic dependence of emotion information
ndash Session
ndash Age 41
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4272
Normalisation
bull Typesndash Per utterance
ndash Per speaker Sethu et al 2007
42No warping Warping
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4372
Modelling
bull Usual considerationsndash Structure of problem andor database
ndash Amount of training data available rarr number of parameters to use
bull Rate of classification decisionndash Often one per utterance
bull Type of result neededndash Categorical eg ldquohappinessrdquordquoboredeomrdquohellip
ndash Continuous eg [-11]
ndash Ordinal eg ldquolowrdquordquomediumrdquordquohighrdquo
43
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4472
Modelling ndash Common Approaches
bull Gaussian mixture model (GMM)
ndash w m weight of the mrsquoth mixture
rsquo
sum=
minus
minusminusminus=
M
m
mm
T
m
mK
mw p1
1
2 12
)()(
2
1exp
)2(
1)( microxCmicrox
C
x
π
ndash m
ndash CCCCm covariance matrix of the mrsquoth mixture
44-6 -4 -2 0 2 4 6 8
0
01
02
03
04
05
06
07
08
09
1
overall pdf
individual
mixtures
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4572
Modelling ndash Common Approaches
bull Maximum a priori GMMndash Previously Train each GMM separately for each target class
ndash MAP-GMM begins with a single modelbull Covering wide range of conditions (emotionsCL)
ndash Acoustic space of all speakers
ndash Trained on ver lar e database
bull ldquoUniversal backgroundrdquo model (UBM) ndash from speaker recognitionndash Advantages
bull Mixtures in each emotion class with same index correspond to same
acoustic class
bull Fast scoring
45
UBM-GMM
Adapted modelhappiness
Adapted model
sadness
x1
x2
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4672
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Can capture temporal variation
bull Main issue what to model ndash Schuumlller et al 2003 pitch energy contours
ndash Nwe et al 2003 LFPC features F0 speaking rate
1 32
ndash e u e a p c ra ec or es n u erance
bull Assumes emotions have same trajectory structure in all utterances
46
Nwe et al 2003Larger utterance
variability than Schuumlller
et al 2003
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4772
Modelling ndash Common Approaches
bull Hidden Markov model (HMM)ndash Lee et al 2004
ndash Phoneme class based statesbull Vowelglidenasalstopfricative
bull Clear what a state represents no assumptions about utterance structure
ndash
47
f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4872
SVM Classifier with GMM Super Vectors
NICTA Copyright 201048
991266 983111983117983117983155 983140983141983155983139983154983145983138983141 983139983151983149983149983151983150 983152983137983156983156983141983154983150983155 983137983155983159983141983148983148 983137983155 983139983148983137983155983155983085983155983152983141983139983145983142983145983139 983152983137983156983156983141983154983150983155983086
991266 983123983126983117 983145983140983141983150983156983145983142983161 983156983144983141 983138983151983157983150983140983137983154983161 983138983141983156983159983141983141983150
983139983148983137983155983155983141983155983084 983145983143983150983151983154983141 983156983144983141 983139983151983149983149983151983150
983140983145983155983156983154983145983138983157983156983145983151983150983155
Cl ifi ti E i i l R lt
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 4972
Classification Empirical Results
ndash LDC Emotional Prosody
ndash Static classifiers
ndash Sethu et al 2009
ndash
ClassifierClassifierClassifierClassifier P + E + WFP + E + WFP + E + WFP + E + WF MFCCMFCCMFCCMFCC
GMM 431 371
PNN 441 356
SVM 405 399
Database (EMO-DB)ndash Schuumlller et al 2005
49
Cl ifi ti H P f
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5072
Classification Human Performance
bull Emotionndash Dellaert et al 1996
ndash ~ 17 for 7-classspeaker dependent
(independent 35)
c er et a
~60 acc for 8-class
bull Cognitive loadndash Le 2009
ndash 62 acc overall
50
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5172
and their Design
51
Databases Desirable Attributes
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5272
Databases Desirable Attributes
bull Emotionndash Emotions sufficiently fully blown to be distinguishable
ndash Naturalnessndash Listener ratings
ndash Matched to context of use
bull Cognitive loadndash Multiple discrete load levels
ndash Levels distinguishable by subjective ratings
ndash Possibly additional sensor data (physiological)
ndash No stress involved subjects motivated
bull Allndash Large multiple speaker
ndash Phonetically linguistically diverse 52
Databases Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5372
Databases Design
bull Scopendash 45 lsquobasicrsquo emotions oversimplifies the problem
bull Naturalnessndash Read speech vs uncontrolled phoneticlinguistic content
bull
ndash Allow use of emotive words
bull Descriptorsndash Elicited emotionsrarr well defined descriptors
ndash Natural emotions Use multiple levels of descriptors or Feeltrace +listener ratings
ndash Douglas-Cowie et al 2003
53
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5472
Databases Resources (English)
bull AIBO Corpusndash 9 hours 51 children spontaneous (WOz) speech 11 categories
ndash httpwww5informatikuni-erlangendeour-teamsteidl-stefanfau-aibo-emotion-corpus 50euro
bull IEMOCAP ndash 12 hours 10 speakers acted speech 9 classes +
valenceactivationdominance labelling
ndash httpsailusceduiemocap
bull LDC Emotional Prosody
ndash frac12 hour 7 speakers acted speech 15 classesndash httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC
2002S28 US$2500
54
Databases Resources (English)
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5572
Databases Resources (English)
bull Reading-Leedsndash 4frac12 hours unscripted TV interviews
bull Cognitive Loadndash DRIVAWORK
bull 14 hours in-car speech under varying workload several physiological
reference signals
ndash SUSAS SUSC-1
bull contains some psychological stress conditions
bull httpwwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC99S
78 US$500
ndash Other databases proprietary
bull More complete listing
ndash See Douglas-Cowie et al 2003 Ververidis and Kotropoulos 2006 55
Databases Experimental Design
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5672
Databases Experimental Design
bull Cognitive load task design Span testsndash Word span digit span
bull Interleave memory tasks presentation of target wordsdigits withdemanding secondary task
ndash Conway et al 2005
ndash Reading span
bull Ability to co ordinate processing and storage resources
bull Subjects must read fluently and answer comprehension questions
ndash Daneman and Carpenter 1980
ndash Speaking span
bull Randomly selected words
bull Speaking span = maximum number of words for which a subject can
successfully construct a grammatically correct sentence
ndash Daneman 1991
ndash Others counting operation reading digit
56
Databases CL Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5772
Databases CL Examples
bull Stroop testndash Subjects presented colour words in
different coloursbull Low congruent Medium incongruent
ndash Time pressure added for high load
bull Story reading ndash Story reading followed by QampA
ndash 3 different levels of text difficulty (Lexile Framework for Reading
wwwlexilecom)
ndash 3 stories in each of the 2 sessions (fixed order)bull 1st session 2nd session
ndash ldquoSleeprdquo (900L) ldquoSmoke Detectorsrdquo (950L)
ndash ldquoHistory of Zerordquo (1350L) amp ldquoHurricanesrdquo (1250L) amp
ndash ldquoMilky Way Galaxyrdquo (1400L) ldquoThe Sunrdquo (1300L)
57
Databases Trends
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5872
Databases Trends
bull lsquoNaturallyrsquo elicited emotion
bull Larger databases
bull Telephone speech
bull Realistic data (ie without controlled lab conditions)
bull Later noise
58
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 5972
Applications of Emotion
Load Measurement
59
Applications Principles
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6072
Applications Principles
bull Like other speech processing appsndash Need microphone
ndash Best in clean environmentsbull Trend towards processing power in the loop
ndash VoIP
ndash Mobile phone iPhone SmartPhone
bull Trend towards user-aware interfacesndash Affective computing
bull The work overload problemndash Mental processes play an increasing role in determining the
workload in most jobs (Gaillard et al 1993)
60
Applications Task Performance
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6172
pp
bull Adjusting the human-computer balance for
optimised task performance
ndash Ideas date back to Yerkes and Dodson (1908)bull Arousal induced using mild electric shocks
61
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6272
pp p
bull Care needed cognitive load prop 1performancendash Task performance says nothing about spare mental capacity
ndash Spare or residual capacity turns out to be an important practicalmeasure
bull Parasuraman et al 2008
ndash Correlation between workload and performance is poorer inunderload and overload tasks
bull Vidulich and Wickens 1986 Yeh and Wickens 1988
ndash No studies for physiological or behavioural measures
62
Applications Examples
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6372
pp p
bull From the literature ndash Air traffic control vehicle handling shipping military rail
bull Embrey et al 2006
ndash Learning and vigilance tasks
bull Sweller 1988 Berka et al 2007
ndash
bull Lamble et al 1999ndash Team decision efficiency
bull Johnston et al 2006
ndash Collaboration load
bull Dillenbourg and Betrancourt 2006ndash Augmented cognition
bull Schmorrow 2005
63
Applications Biomedical
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6472
bull Brain training for ADHDndash Improve working memory
ndash wwwcogmedcomndash Supported by substantial research base
bull Diagnosistreatment of mental disorders ndash Objective measure
ndash eg Workman et al 2007
ndash Still to be investigated
bull Management of psychological stressndash Has occupational health and safety implications
ndash Still to be investigated
64
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6572
Thank YouCognitive Load and EmotionCognitive Load and EmotionCognitive Load and EmotionCognitive Load and Emotion
Emotion Recognition ndash Speech Pitch Contour
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6672
PitchEstimation
EmotionalSpeech
VAD
Segmented PitchContour
Segmental Linear
Approximation
f s e t b
NICTA Copyright 201066
16 17 18 19 2
I n i t i a l O f
Parameters ndash b s x
PITCH ndash LINEAR APPROXIMATION
05 1 15 2 25 3 35
HMM BASED
EMOTION MODELS
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6772
bull Ideally want to compare emotion recognition and
CL systems on same data but
ndash No ground truthbull Classification becomes clustering
ndash
NICTA Copyright 201067
bull Could try to build a lsquocommonrsquo system butndash Where and how to compromise
ndash Which data
bull Almost by definition there is no database that contains
both emotion and cognitive load that is also usefully
annotated
Emotion vs CL Empirical comparisonEmotion vs CL Empirical comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6872
bull What we did
ndash Took two near-state-of-the-art systems
bull Relative to LDC Emotional Prosody andbull Stroop Test corpora
ndash Applied each system to both corpora
NICTA Copyright 201068
bull Use closed-set trainingtest
ndash Produced a set of emotion and cognitive load labels for
each corpus
ndash Compared clusterings based on labels
Emotion vs CL Rand IndexEmotion vs CL Rand Index
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 6972
bull Adjusted Rand Index
NICTA Copyright 201069
ndash Adjusts RI by subtracting expected valuendash ARI = 1 means clustering identical
ndash ARI cong 0 means random labeling
Emotion vs CL Clustering comparisonEmotion vs CL Clustering comparison
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7072
bull Emotional Prosody Speech Corpus (LDC)ndash 5 emotions
bull Neutral
bull Angerbull Sadness
bull Happiness
NICTA Copyright 201070
bull Stroop Test Datandash 3 CL levels
bull Resultsndash LDC Emotional Prosody
bull ARI = 00087
ndash Stroop Test Corpus
bull ARI = 00253
Conclusion
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7172
bull Data
ndash Plenty of acted experimental databases as reference
ndash Natural data are current challenge
bull Features
ndash Broad vs detailed spectral features implications for modelling
ndash Temporal information needs to be captured
ndash Centroid features (together) are promising
bull Classification
ndash Speaker variability is a very strong effect
ndash Select correct classification paradigm for problem
bull Applications
ndash Emerging many still to be well defined
71
Acknowledgements
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72
7242019 Emotion Recognition and Cognitive
httpslidepdfcomreaderfullemotion-recognition-and-cognitive 7272
bull 8 Project staff (NICTA)ndash Prof Eliathamby Ambikairajah Dr Fang Chen Dr Julien Epps Dr Bo
Yin Dr Eric Choi Dr Ronnie Taib Dr Natalie Ruiz Dr Yang Wang
bull 11 PhD students (UNSWNICTA)
ndash Tet Fei Yap Phu Ngoc Le Asif Khawaja Vidhyasaharan Sethu Karen
ndash Pega Zarjam Kun Yu Siyuan Chen Guanzhong Li Sazzad Hassanbull Collaborators
ndash UNSW Speech Processing Laboratory Prof Paul Corballis (Georgia
Tech US) Prof John Sweller (UNSW)
72