Curriculum Part I.. Three Versions of Curriculum Subject Centered Teacher Centered Student Centered.
Human-Centered Computing: Using Speech to Understand...
Transcript of Human-Centered Computing: Using Speech to Understand...
![Page 1: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/1.jpg)
Human-Centered Computing:
Using Speech to Understand Behavior
Emily Mower Provost
Computer Science and Engineering
University of Michigan
![Page 2: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/2.jpg)
Automated agents provide support, entertainment, and
interaction
http://the-big-turn-on.co.uk/pics/future.jpg
Image source: http://newatlas.com/toyota-kirobo-mini-companion-robot-release/45720/
https://www.nimh.nih.gov/about/organization/gmh/grandchallenges/index.shtml
chai.eecs.umich.edu
![Page 3: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/3.jpg)
Embracing Complexity
Environments LexicalContent
IndividualDifferences Emotion
Speech
chai.eecs.umich.edu
![Page 4: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/4.jpg)
CHAI Lab Research Directions
• Audio-visual emotion modeling:
– Perception modeling
– Expression modeling
– Methods: deep learning, multitask learning, time series modeling,
knowledge-driven
• Assistive technology:
– Speech assessment for individuals with aphasia
– Mood state tracking for individuals with bipolar disorder
– [Early states] Estimating suicidality
– [Early states] Speech assessment: Huntington’s Disease
chai.eecs.umich.edu
![Page 5: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/5.jpg)
CHAI Lab Research Directions
• Audio-visual emotion modeling:
– Perception modeling
– Expression modeling
– Methods: deep learning, multitask learning, time series modeling,
knowledge-driven
• Assistive technology:
– Speech assessment for individuals with aphasia
– Mood state tracking for individuals with bipolar disorder
– [Early states] Estimating suicidality
– [Early states] Speech assessment: Huntington’s Disease
chai.eecs.umich.edu
![Page 6: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/6.jpg)
CHAI Lab Research Directions
• Audio-visual emotion modeling:
– Perception modeling
– Expression modeling
– Methods: deep learning, multitask learning, time series modeling,
knowledge-driven
• Assistive technology:
– Speech assessment for individuals with aphasia
– Mood state tracking for individuals with bipolar disorder
– [Early states] Estimating suicidality
– [Early states] Speech assessment: Huntington’s Disease
chai.eecs.umich.edu
![Page 7: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/7.jpg)
Focus on Behaviors
• Goal: detect human behavior from speech
• Emotion: valence (positivity), activation (energy), categories
• Mood: depression, suicidality
• Diagnosis: Huntington Disease, aphasia
speech signal activation/valence patternsconversation
extract
features +
model
extract
speech
signal
chai.eecs.umich.edu
![Page 8: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/8.jpg)
ALGORITHMS à IMPACT
Why is this area so important?
chai.eecs.umich.edu
![Page 9: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/9.jpg)
Motivation
• Bipolar Disorder (BP)
– A leading cause of disability worldwide
– Common, chronic, and severe psychiatric illness
– Characterized by swings into mania and depression
– Devastating personal, social, vocational consequences
• Current Treatment
– Pharmaceutically
– Periodic follow-up visits for monitoring
– Reactively post manic/depressive episodes
Costly / Majority Unnecessary
Devastating Consequences
chai.eecs.umich.edu
![Page 10: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/10.jpg)
Wellness Monitoring
Depression
Mania
Baseline
chai.eecs.umich.edu
![Page 11: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/11.jpg)
Wellness Monitoring
Depression
Mania
Baseline
Over-arching question:
How can we automatically identify
an individual’s early warning signs?
chai.eecs.umich.edu
![Page 12: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/12.jpg)
Wellness Monitoring
Depression
Mania
Baseline
Engineering question:
How can we augment algorithm
design with clinical knowledge?
This work:
What if we focus on emotion?
chai.eecs.umich.edu
![Page 13: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/13.jpg)
PRIORI Roadmap
Initial paperSpeech
rhythmPersonalization Emotion
2014 2016 2016 2018
Clin
ica
l M
oo
d A
sse
ssm
en
t
Important change:
- Inclusion of speech
recognition
- Innovations in robust
and generalizable
emotion recognition
LanguageAnomaly
Detection
Emotion
in-the-wild
Na
tura
l Sp
ee
ch
---------
![Page 14: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/14.jpg)
PRIORI Roadmap
Initial paperSpeech
rhythmPersonalization Emotion
2014 2016 2016 2018
Clin
ica
l M
oo
d A
sse
ssm
en
t
Important change:
- Inclusion of speech
recognition
- Innovations in robust
and generalizable
emotion recognition
LanguageAnomaly
Detection
Emotion
in-the-wild
Na
tura
l Sp
ee
ch
---------
![Page 15: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/15.jpg)
PRIORI Roadmap
Initial paperSpeech
rhythmPersonalization Emotion
2014 2016 2016 2018
Clin
ica
l M
oo
d A
sse
ssm
en
t
Important change:
- Inclusion of speech
recognition
- Innovations in robust
and generalizable
emotion recognition
LanguageAnomaly
Detection
Emotion
in-the-wild
Na
tura
l Sp
ee
ch
---------
[Interspeech
2019]
[In prep.]
![Page 16: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/16.jpg)
PRedicting Individual Outcomes for Rapid Intervention
Targeted FeedbackApp
Records
Calls
Feature
Extraction
Privacy
Preserving
Detect
Mood
Symptoms
Intervention
Warning!
In Need
of Care
![Page 17: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/17.jpg)
Types of PRIORI Calls
• Personal calls:
– Calls made as someone goes about his/her day
– Natural speech Personal calls
Time
...
![Page 18: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/18.jpg)
Types of PRIORI Calls
• Personal calls:
– Calls made as someone goes about his/her day
– Natural speech
• Assessment calls:
– Clinical interactions over the phone
– Young Mania Rating Scale (YMRS)
– Hamilton Depression Scale (HamD)
Assessment
call
Personal calls
Time
...
![Page 19: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/19.jpg)
Types of PRIORI Calls
• Personal calls:
– Calls made as someone goes about his/her day
– Natural speech
• Assessment calls:
– Clinical interactions
– Young Mania Rating Scale (YMRS)
– Hamilton Depression Scale (HamD) Personal calls grouped by
assessment call
Assessment
callGroup
Personal calls
Time
...
![Page 20: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/20.jpg)
The PRIORI dataset
• PRIORI:
– Longitudinal study of bipolar disorder
– Collect and analyze mood data for individuals with BP
– Develop a mood recognition systems
• Participants
– Patients: BP I and II (51)
– Healthy controls (9)
– Dataset size: over 50K calls, over 4K hours of speech
Initial paperSpeech
rhythmPersonalization Emotion
Language
2014 2016 2016 2018
Anomaly
Detection
Emotion
in-the-wild
![Page 21: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/21.jpg)
Initial Paper
Initial paperSpeech
rhythmPersonalization Emotion
Language
2014 2016 2016 2018
Anomaly
Detection
Emotion
in-the-wild
•Collect and present a new dataset!
•Determine the efficacy of emotion techniques for recognizing moodGoal
•Emotion and mood both modulate the speech signalInsight
•Extract common emotion features
•Classify using common emotion recognition techniquesApproach
•There do seem to be differences!Findings
Zahi N Karam, Emily Mower Provost, Satinder Singh, Jennifer Montgomery, Christopher Archer, Gloria
Harrington, Melvin Mcinnis. ``Ecologically Valid Long-term Mood Monitoring of Individuals with
Bipolar Disorder Using Speech.'' International Conference on Acoustics, Speech and Signal Processing
(ICASSP). Florence, Italy. May 2014.
![Page 22: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/22.jpg)
Speech Rhythm
Initial paperSpeech
rhythmPersonalization Emotion
Language
2014 2016 2016 2018
Anomaly
Detection
Emotion
in-the-wild
• Determine whether a clinician would designate a person in a mood episode using the rhythm of speech in a clinical interaction
Goal
• When manic, speech rate increases, when depressed, it decreasesInsight
• Create a robust pre-processing pipeline
• Classify mood episodeApproach
• Rhythm can be used to estimate mood
• It is critical to control for extraneous factors!Findings
John Gideon, Emily Mower Provost, Melvin McInnis. “Mood State Prediction From Speech Of Varying
Acoustic Quality For Individuals With Bipolar Disorder.” International Conference on Acoustics,
Speech and Signal Processing (ICASSP). Shanghai, China, March 2016.
![Page 23: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/23.jpg)
Methods
Audio
Signal
Device
CompensationSegmentation
Estimate
Rhythm
SVM
ClassificationMood
RBAR Declipping Combo-SAD Single-task
Multi-task
Preprocessing Feature
Extraction
Classification
![Page 24: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/24.jpg)
Methods and Results
Preprocessing
Feature Extraction
Classification
Measure:
AUCBaseline
Mania 0.57 ± 0.25
Depression 0.64 ± 0.14
Audio
SignalSegmentation
Estimate
Rhythm
ST SVM
ClassificationMood
![Page 25: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/25.jpg)
Methods and Results
Audio
Signal
Device
CompensationSegmentation
Estimate
Rhythm
ST SVM
ClassificationMood
Preprocessing
Feature Extraction
Classification
Measure:
AUCBaseline
RBAR
Declipping
Mania 0.57 ± 0.25 0.70 ± 0.17*
Depression 0.64 ± 0.14 0.65 ± 0.15
* Paired t-test over subjects, p < 0.05
![Page 26: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/26.jpg)
Methods and Results
Audio
Signal
Device
Compensation
Estimate
Rhythm
SVM
ClassificationMood
Preprocessing
Feature Extraction
Classification
Measure:
AUCBaseline
RBAR
Declipping
Ignoring
Segmentation
Mania 0.57 ± 0.25 0.70 ± 0.17* 0.74 ± 0.24*
Depression 0.64 ± 0.14 0.65 ± 0.15 0.77 ± 0.15*
* Paired t-test over subjects, p < 0.05
![Page 27: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/27.jpg)
Methods and Results
Audio
SignalMood
Preprocessing
Feature Extraction
Classification
Measure:
AUCBaseline
RBAR
Declipping
Multitask
Learning
Mania 0.57 ± 0.25 0.70 ± 0.17* 0.72 ± 0.20*
Depression 0.64 ± 0.14 0.65 ± 0.15 0.71 ± 0.15
Device
CompensationSegmentation
Estimate
Rhythm
SVM
Classification
* Paired t-test over subjects, p < 0.05
![Page 28: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/28.jpg)
Methods and Results
Audio
SignalMood
Preprocessing
Feature Extraction
Classification
Measure:
AUCBaseline
RBAR
Declipping
Subject
Normalization
Mania 0.57 ± 0.25 0.70 ± 0.17* 0.67 ± 0.19*
Depression 0.64 ± 0.14 0.65 ± 0.15 0.75 ± 0.14*
Device
CompensationSegmentation
Estimate
Rhythm
SVM
Classification
* Paired t-test over subjects, p < 0.05
![Page 29: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/29.jpg)
Personalization
Initial paperSpeech
rhythmPersonalization Emotion
Language
2014 2016 2016 2018
Anomaly
Detection
Emotion
in-the-wild
• Improve the prediction of depressionGoal
• Individuals are unique and so is their expression of moodInsight
• Speaker verification techniques (i-vectors)Approach
• We can improve depression prediction over speech rhythm features
aloneFindings
Soheil Khorram, John Gideon, Melvin McInnis, and Emily Mower Provost. "Recognition of Depression
in Bipolar Disorder: Leveraging Cohort and Person-Specific Knowledge." Interspeech. San Francisco,
CA, September 2016.
![Page 30: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/30.jpg)
Personalization
Feature Fusion Decision Fusion
Initial paperSpeech
rhythmPersonalization Emotion
Language
2014 2016 2016 2018
Anomaly
Detection
Emotion
in-the-wild
![Page 31: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/31.jpg)
Personalization
Feature Fusion Decision Fusion
System Characteristics AUC
Population-general 0.69 ± 0.15
Subject-specific 0.70 ± 0.18
Feature Fusion 0.76 ± 0.13*
Constant Decision Fusion 0.74 ± 0.16
Soft Decision Fusion 0.78 ± 0.12*
Hard Decision Fusion 0.76 ± 0.13
Results
Initial paperSpeech
rhythmPersonalization Emotion
Language
2014 2016 2016 2018
Anomaly
Detection
Emotion
in-the-wild
![Page 32: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/32.jpg)
Emotion
Initial paperSpeech
rhythmPersonalization Emotion
Language
2014 2016 2016 2018
Anomaly
Detection
Emotion
in-the-wild
• Move to personal calls!Goal
• Mood is slowly varying, can we improve prediction by focusing on factors more
directly expressed in speech?Insight
• Annotate the data for emotion!
• Transcribe the data!Approach
• We can accurately predict emotion from natural speech
• In clinical interactions, emotion patterns change with symptom severityFindings
Soheil Khorram, Mimansa Jaiswal, John Gideon, Melvin McInnis, Emily Mower Provost. “The PRIORI
Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild.” Interspeech. Hyderabad, India.
September 2018.
![Page 33: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/33.jpg)
Identifying an intermediary step
• Mood prediction is challenging:
– Not directly observable
– Long time scale
• Emotion can simplify mood prediction:
– Primary BP symptom: emotion dysregulation, utility in classification*
– Time course: emotion variation between speech and mood
Seconds Minutes Hours Days Weeks Months Years Lifetime
Expressions
Self Report Emotion
Moods
Emotional Disorders
Personality Traits
Speech ContentGoal: Find patterns in expressions
that relate to emotion and mood
* Stasak, B., Epps, J., Cummins, N., & Goecke, R. (2016).
An Investigation of Emotional Speech in Depression
Classification. In INTERSPEECH (pp. 485-489).
Reference:
Soheil Khorram, Mimansa Jaiswal, John
Gideon, Melvin McInnis, Emily Mower
Provost. “The PRIORI Emotion Dataset:
Linking Mood to Emotion Detected In-the-
Wild.” Interspeech. Hyderabad, India.
September 2018.
![Page 34: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/34.jpg)
Emotion Annotation Pipeline
Subject Selection
SegmentationSegment Selection
Segment Inspection
Segment Annotation
chai.eecs.umich.edu
• Valence and activation annotation:
– 9-point Likert scale
– 11 annotators (7 female, 4 male), between 21 and 34, native speakers of
English
• Annotators were asked to consider two important points:
– Only the acoustic characteristics, not the content
– Subject-specificity of emotion expression
![Page 35: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/35.jpg)
Emotion Distributions
*Note: categorical labels for
demonstration purposes only.
chai.eecs.umich.edu
![Page 36: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/36.jpg)
Emotion Recognition Experimental Setup
• Normalize ground truth labels:
– Subtracting the rating midpoint of 5
– Scaling to the range of [−1, 1]
• Subject-independent cross-validation
– Experiments repeated for five total runs (six randomly selected folds)
– Each run: randomly assign two subjects to each fold.
– Round-robin cross-validation
– Generates one test measure per fold, resulting in six measures.
– Output: matrix of 6-by-5 test measures
• Parameter selection: max CCC over validation set
chai.eecs.umich.edu
![Page 37: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/37.jpg)
Features and Models
• Baseline system
– 88-dimensional eGeMAPS features
– Features globally normalized
– Feed-forward neural network,
tanh activation function, linear output
• Alternative system
– 40-dimensional MFB features
– Features globally normalized
– Conv-pool network (convolutional layers, global max pooling, dense layers)
– ReLU and linear activation functions for
intermediate and output
Convolutional
layer
Global max
Dense
layers
Extract features
Conv kernels
Soft
max
256kernels
256-dimvector
Valence
Input features
chai.eecs.umich.edu
![Page 38: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/38.jpg)
Emotion Results
• Conv-Pool > FFNN (PCC, CCC)
Dimension Metric eGeMAPS FFN MFBs Conv-Pool
Activation PCC 0.642 ± 0.076 0.712 ± 0.077
CCC 0.593 ± 0.071 0.660 ± 0.090
RMSE 0.207 ± 0.012 0.201 ± 0.028
Vale
nce
PCC 0.271 ± 0.053 0.405 ± 0.062
CCC 0.191 ± 0.031 0.326 ± 0.052
RMSE 0.199 ± 0.015 0.194 ± 0.016
Bold: p<0.01, paired t-testchai.eecs.umich.edu
![Page 39: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/39.jpg)
Emotion Results
• Conv-Pool > FFNN (PCC, CCC)
• Activation more accurately recognized
Dimension Metric eGeMAPS FFN MFBs Conv-Pool
Activation PCC 0.642 ± 0.076 0.712 ± 0.077
CCC 0.593 ± 0.071 0.660 ± 0.090
RMSE 0.207 ± 0.012 0.201 ± 0.028
Vale
nce
PCC 0.271 ± 0.053 0.405 ± 0.062
CCC 0.191 ± 0.031 0.326 ± 0.052
RMSE 0.199 ± 0.015 0.194 ± 0.016
Bold: p<0.01, paired t-testchai.eecs.umich.edu
![Page 40: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/40.jpg)
Mood Dataset
• Goal: Analyze link between mood and predicted emotion
Euthymic
(22%)
Depressed
(38%)
Excluded
(31%)
YM
RS M
an
ic
(9%
)
HamD
chai.eecs.umich.edu
![Page 41: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/41.jpg)
Experimental Setup
• Goal: Analyze link between mood and predicted emotion
• Considerations:
– Importance of considering how a subject varies about his/her own baseline (euthymic periods)
– Normalize depressed, manic segments by subject (euthymic segments)
• Approach:
– Apply conv-pool models to predict emotion
– Use ensemble over the cross-validation models
– Analyze over all 10,563 assessment call segments (10,563)
chai.eecs.umich.edu
![Page 42: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/42.jpg)
What is the link between mood and emotion?
• Ways to measure:
– Observe clinical interactions
– Relate emotion to mood symptom severity (classes or continuous)
• Finding: valence/activation significantly higher in manic vs. depressed episodes
chai.eecs.umich.edu
Mania
Depression
Valence and Activation
Valence: positive vs. negative
Activation: calm vs. excited
![Page 43: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/43.jpg)
What is the link between mood and emotion?
• Ways to measure:
– Observe clinical interactions
– Relate emotion to mood symptom severity (classes or continuous)
• Finding: valence/activation are significantly correlated with mood severity
chai.eecs.umich.edu
Valence: positive vs. negative
Activation: calm vs. excited
Valence/activatio
nMood severity
![Page 44: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/44.jpg)
Comparing Emotion Distributions
• Comparing distributions of valence/activation across subjects
• Comparisons:
– Over all subjects: one-way ANOVA with p < 0.01
– Pairwise comparisons: Tukey-Kramer posthoc test (66 pairs)
• Findings:
– Activation: overall difference, significantly different in 51 cases
– Valence: overall difference, significantly different in 48 cases
chai.eecs.umich.edu
![Page 45: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/45.jpg)
Embracing Complexity
Environments LexicalContent
IndividualDifferences Emotion
Speech
chai.eecs.umich.edu
![Page 46: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/46.jpg)
Research Question
Emotion is a big data problem!
But, what is the best method for transferring
paralinguistic information and datasets with different conditions to emotion?
chai.eecs.umich.edu
Reference:
John Gideon, Melvin McInnis, Emily Mower Provost. "Barking up the
Right Tree: Improving Cross-Corpus Speech Emotion Recognition with
Adversarial Discriminative Domain Generalization (ADDoG)," IEEE
Transactions on Affective Computing, vol: To appear, 2019.
![Page 47: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/47.jpg)
Domain Generalization
• Goal: creates a middle-ground representation for unseen data
• Removes factors particular to individual datasets
Variability in Datasets
Environment
Noise
Subject
DemographicsRecording
Device Quality
Elicitation Strategy
chai.eecs.umich.edu
![Page 48: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/48.jpg)
Domain Generalization – Autoencoders
Eskimez et al. 2018
Denoising Autoencoder (DAE) Adversarial Autoencoder (AAE)
Variational Autoencoder (VAE) Adversarial Variational Bayes (AVB)
chai.eecs.umich.edu
![Page 49: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/49.jpg)
Domain Generalization – DANNs
• Domain Adversarial Neural Networks
• Encode a middle representation
• Discriminative: Classify emotionand domain from middle layer
• Adversarial: Backpropagate the
reverse gradient of domain
• “Unlearns” domain
• No clear target – challenges with converging
Ajakan et al. 2014; Abdelwahab et al. 2018
?
chai.eecs.umich.edu
![Page 50: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/50.jpg)
What if we could still be discriminative?
chai.eecs.umich.edu
![Page 51: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/51.jpg)
What if we could still be discriminative?
What if we had a clear target?
chai.eecs.umich.edu
![Page 52: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/52.jpg)
Datasets
IEMOCAP MSP-IMPROV
Subjects (Male/Female) 10 (5/5) 12 (6/6)
Environment Laboratory Laboratory
Language English English
Sample Rate 16 kHz 44.1 kHz
Total Utterances 10039 8438
chai.eecs.umich.edu
![Page 53: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/53.jpg)
Labels
IEMOCAP MSP-IMPROV
Total Utterances 10039 8438
Likert Scale 1-5 1-5
Class Boundaries 1-2, 3, 4-5 1-2, 3, 4-5
Mean (Std.) Activation 3.08 (0.90) 2.57 (1.10)
Utt. Without Ties 4814 7290
Mean (Std.) Valence 2.79 (0.99) 3.02 (1.06)
Utt. Without Ties 6816 7852
chai.eecs.umich.edu
![Page 54: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/54.jpg)
Labels
IEMOCAP MSP-IMPROV
Total Utterances 10039 8438
Likert Scale 1-5 1-5
Class Boundaries 1-2, 3, 4-5 1-2, 3, 4-5
Mean (Std.) Activation 3.08 (0.90) 2.57 (1.10)
Utt. Without Ties 4814 7290
Mean (Std.) Valence 2.79 (0.99) 3.02 (1.06)
Utt. Without Ties 6816 7852
chai.eecs.umich.edu
![Page 55: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/55.jpg)
Labels
IEMOCAP MSP-IMPROV
Total Utterances 10039 8438
Likert Scale 1-5 1-5
Class Boundaries 1-2, 3, 4-5 1-2, 3, 4-5
Mean (Std.) Activation 3.08 (0.90) 2.57 (1.10)
Utt. Without Ties 4814 7290
Mean (Std.) Valence 2.79 (0.99) 3.02 (1.06)
Utt. Without Ties 6816 7852
chai.eecs.umich.edu
![Page 56: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/56.jpg)
Method Overview
Generate
Representation
Emotion
Classification
Domain Critic
Audio
chai.eecs.umich.edu
![Page 57: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/57.jpg)
Baseline: CNN
CNN: Convolutional Neural Network trained on all labeled data;
SP: Specialist CNN trained on just target labeled data (if available)
chai.eecs.umich.edu
![Page 58: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/58.jpg)
ADDoG: Adversarial Discriminative Domain Gen.
chai.eecs.umich.edu
![Page 59: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/59.jpg)
MADDoG: Multiclass ADDoG
chai.eecs.umich.edu
![Page 60: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/60.jpg)
Experimental Overview
• Four datasets:– IEMOCAP (16 kHz)
– MSP-Improv (44.1 kHz)
– PRIORI Emotion (8 kHz)
• Features: Mel Filterbanks (40d, length zero-padded to longest in batch)
• Task: cross-domain valence recognition (three-class)
• Setups:– Train on one lab dataset, test on another (IEMOCAP/MSP-Improv)
– Train on one lab dataset, test on PRIORI Emotion
– Train on two lab datasets, test on PRIORI Emotion
chai.eecs.umich.edu
![Page 61: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/61.jpg)
Experiment 1 – Cross Dataset
Test
Other Lab
Dataset
Train
IEMOCAP
or MSP
All
Labeled
None
Labeled
MSP-Improv to IEMOCAP IEMOCAP to MSP-Improv
CNN 0.439 ± 0.022 UAR 0.432 ± 0.012 UAR
ADDoG 0.474 ± 0.009 UAR* 0.444 ± 0.007 UAR*
*Denotes results significantly better than CNN (paired t-test, p=0.05)
chai.eecs.umich.edu
![Page 62: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/62.jpg)
Experiment 1 – Increasing Target Labels
Test
Other Lab
Dataset
Train
IEMOCAP
or MSP
All
Labeled
Some
Labeled
chai.eecs.umich.edu
![Page 63: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/63.jpg)
Experiment 1 – Increasing Target Labels
Test
Other Lab
Dataset
Train
IEMOCAP
or MSP
All
Labeled
Some
Labeled
MSP-Improv to
IEMOCAP
IEMOCAP to
MSP-Improv
Dots denote results significantly different than ADDoG (paired t-test, p=0.05)
![Page 64: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/64.jpg)
Experiment 1 – Increasing Target Labels
Test
Other Lab
Dataset
Train
IEMOCAP
or MSP
All
Labeled
Some
Labeled
MSP-Improv to
IEMOCAP
IEMOCAP to
MSP-Improv
Dots denote results significantly different than ADDoG (paired t-test, p=0.05)
![Page 65: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/65.jpg)
Experiment 1 – Increasing Target Labels
Test
Other Lab
Dataset
Train
IEMOCAP
or MSP
All
Labeled
Some
Labeled
MSP-Improv to
IEMOCAP
IEMOCAP to
MSP-Improv
Dots denote results significantly different than ADDoG (paired t-test, p=0.05)
![Page 66: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/66.jpg)
Experiment 2 – To In-the-Wild Data
Test
PRIORI
Emotion
Train
IEMOCAP
or MSP
All
Labeled
Some
Labeled
IEMOCAP to
PRIORI Emotion
MSP-Improv to
PRIORI Emotion
Dots denote results significantly different than ADDoG (paired t-test, p=0.05)
![Page 67: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/67.jpg)
Experiment 2 – To In-the-Wild Data
Test
PRIORI
Emotion
Train
IEMOCAP
or MSP
All
Labeled
Some
Labeled
IEMOCAP to
PRIORI Emotion
MSP-Improv to
PRIORI Emotion
Dots denote results significantly different than ADDoG (paired t-test, p=0.05)
![Page 68: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/68.jpg)
Experiment 3 – To In-the-Wild Data
IEMOCAP and MSP-Improv to PRIORI Emotion
Dots denote results significantly different than MADDoG (paired t-test, p=0.05)
Test
PRIORI
Emotion
Train
IEMOCAP
or MSP
All
Labeled
Some
Labeled
chai.eecs.umich.edu
![Page 69: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/69.jpg)
Experiment 3 – To In-the-Wild Data
IEMOCAP and MSP-Improv to PRIORI Emotion
Dots denote results significantly different than MADDoG (paired t-test, p=0.05)
Test
PRIORI
Emotion
Train
IEMOCAP
and MSP
All
Labeled
Some
Labeled
chai.eecs.umich.edu
![Page 70: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/70.jpg)
Experiment 3 – To In-the-Wild Data
IEMOCAP and MSP-Improv to PRIORI Emotion
Dots denote results significantly different than MADDoG (paired t-test, p=0.05)
What we learn:
We can’t train a model on
outside datasets and expect
them to just work
Test
PRIORI
Emotion
Train
IEMOCAP
and MSP
All
Labeled
Some
Labeled
chai.eecs.umich.edu
![Page 71: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/71.jpg)
Experiment 3 – To In-the-Wild Data
IEMOCAP and MSP-Improv to PRIORI Emotion
Dots denote results significantly different than MADDoG (paired t-test, p=0.05)
Test
PRIORI
Emotion
Train
IEMOCAP
and MSP
All
Labeled
Some
Labeled
chai.eecs.umich.edu
![Page 72: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/72.jpg)
Experiment 3 – To In-the-Wild Data
IEMOCAP and MSP-Improv to PRIORI Emotion
Dots denote results significantly different than MADDoG (paired t-test, p=0.05)
Test
PRIORI
Emotion
Train
IEMOCAP
and MSP
All
Labeled
Some
Labeled
Where we can go:
We can use these models to
derive emotion features in
other domains
[Interspeech 2019]
chai.eecs.umich.edu
![Page 73: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/73.jpg)
Conclusions
• ADDoG and MADDoG consistently converge
– Clear target at each step (other dataset)
– “Meet in the middle” approach
• Effective at detecting emotion in smartphone calls
chai.eecs.umich.edu
![Page 74: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/74.jpg)
Remaining challenge:
We still aren’t sure about the representation itself!
chai.eecs.umich.edu
![Page 75: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/75.jpg)
Emotion Recognition Representation
40dim
Fullyconnected layer (FC1)
Fullyconnected layer (FC2)
Softmax
Convolutionallayer
Globalmaxpooling
Extract features
16 width128
kernels
128dimvector
MFB features
Classification loss
fSPL
What if the representation
held emotional
meaning?
What if points close in
emotion were close
to each other?
Biqiao Zhang, Yuqing Kong, Georg Essl, Emily Mower Provost.
“f-Similarity Preservation Loss for Soft Labels: A
Demonstration on Cross-Corpus Speech Emotion
Recognition.” AAAI. Hawaii. January 2019.
![Page 76: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/76.jpg)
Deep Metric Learning (DML)
• Goal: learn an embedding space where pairwise distance corresponds to label similarity
• Contributions:
– New family of loss functions (f-similarity preservation loss) for soft labels
– Pair sampling method for efficient implementation in neural networks
chai.eecs.umich.edu
![Page 77: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/77.jpg)
Deep Metric Learning (DML)
• Goal: learn an embedding space where pairwise distance corresponds to label similarity
• Contributions:
– New family of loss functions (f-similarity preservation loss) for soft labels
– Pair sampling method for efficient implementation in neural networks
chai.eecs.umich.edu
![Page 78: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/78.jpg)
Deep Metric Learning (DML)
• Goal: learn an embedding space where pairwise distance corresponds to label similarity
• Contributions:
– New family of loss functions (f-similarity preservation loss) for soft labels
– Pair sampling method for efficient implementation in neural networks
⚪Happy
⚪Happy
⚪Happy
"Angry
chai.eecs.umich.edu
![Page 79: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/79.jpg)
Deep Metric Learning (DML)
• Goal: learn an embedding space where pairwise distance corresponds to label similarity
• Contributions:
– New family of loss functions (f-similarity preservation loss) for soft labels
– Pair sampling method for efficient implementation in neural networks
⚪Happy
⚪Happy
⚪Happy
"Angry
Triplet Loss
[Weinberger and Saul 2009; Chechik et al. 2010;
Hoffer and Ailon 2015; Schroff et al. 2015]
chai.eecs.umich.edu
![Page 80: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/80.jpg)
Deep Metric Learning (DML)
• Goal: learn an embedding space where pairwise distance corresponds to label similarity
• Contributions:
– New family of loss functions (f-similarity preservation loss) for soft labels
– Pair sampling method for efficient implementation in neural networks
⚪Happy
⚪Happy
⚪Happy
"Angry
Triplet Loss
[Weinberger and Saul 2009; Chechik et al. 2010;
Hoffer and Ailon 2015; Schroff et al. 2015]
chai.eecs.umich.edu
![Page 81: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/81.jpg)
Deep Metric Learning (DML)
• Goal: learn an embedding space where pairwise distance corresponds to label similarity
• Contributions:
– New family of loss functions (f-similarity preservation loss) for soft labels
– Pair sampling method for efficient implementation in neural networks
⚪Happy
⚪Happy
⚪Happy
"Angry
Triplet Loss
[Weinberger and Saul 2009; Chechik et al. 2010;
Hoffer and Ailon 2015; Schroff et al. 2015]
chai.eecs.umich.edu
![Page 82: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/82.jpg)
Deep Metric Learning (DML)
• Goal: learn an embedding space where pairwise distance corresponds to label similarity
• Contributions:
– New family of loss functions (f-similarity preservation loss) for soft labels
– Pair sampling method for efficient implementation in neural networks
⚪Happy
⚪Happy
⚪Happy
"Angry
Triplet Loss
[Weinberger and Saul 2009; Chechik et al. 2010;
Hoffer and Ailon 2015; Schroff et al. 2015]
Variability is signal, not just noise
chai.eecs.umich.edu
![Page 83: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/83.jpg)
Hard labels are too limiting.
• Disagreement in evaluation is extremely common
Nega%veEval. 2
NeutralEval. 3
Negative
Eval. 1
Neutral
Eval. 4
?
chai.eecs.umich.edu
![Page 84: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/84.jpg)
x1
x2
x3
y1
y2
y3
g*(x2)
g*(x3)
Feature Space
Label Space
Learned Embedding Space
g*(x1)
g G
Ax
Ay
AG, S: A
G x A
G [0,2]
Goal: learn an embedding
space where feature
similarity = label similarity
f-Similarity Preservation Loss
chai.eecs.umich.edu
![Page 85: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/85.jpg)
x1
x2
x3
y1
y2
y3
g*(x2)
g*(x3)
Feature Space
Label Space
Learned Embedding Space
g*(x1)
g G
Ax
Ay
AG, S: A
G x A
G [0,2]
Goal: learn an embedding
space where feature
similarity = label similarity
f-Similarity Preservation Loss
chai.eecs.umich.edu
![Page 86: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/86.jpg)
x1
x2
x3
y1
y2
y3
g*(x2)
g*(x3)
Feature Space
Label Space
Learned Embedding Space
g*(x1)
g G
Ax
Ay
AG, S: A
G x A
G [0,2]
Goal: learn an embedding
space where feature
similarity = label similarity
f-Similarity Preservation Loss
chai.eecs.umich.edu
![Page 87: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/87.jpg)
x1
x2
x3
y1
y2
y3
g*(x2)
g*(x3)
Feature Space
Label Space
Learned Embedding Space
g*(x1)
g G
Ax
Ay
AG, S: A
G x A
G [0,2]
Goal: learn an embedding
space where feature
similarity = label similarity
f-Similarity Preservation Loss
chai.eecs.umich.edu
![Page 88: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/88.jpg)
40dim
Fullyconnected layer (FC1)
Fullyconnected layer (FC2)
Softmax
Convolutionallayer
Globalmaxpooling
Extract features
16 width128
kernels
128dimvector
MFB features
Classification loss
fSPL
Enforce emotional
meaning!
New Representations
chai.eecs.umich.edu
![Page 89: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/89.jpg)
Performance on heldout data
• f-SPL less susceptible to
overfitting
• Statistically significantly higher performance compared to cross-entropy
loss
chai.eecs.umich.edu
![Page 90: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/90.jpg)
Embedding with emotional meaning
Baseline
chai.eecs.umich.edu
![Page 91: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/91.jpg)
Embracing Complexity
Environments LexicalContent
IndividualDifferences Emotion
Speech
chai.eecs.umich.edu
![Page 92: Human-Centered Computing: Using Speech to Understand …bigdatasummerinst.sph.umich.edu/wiki/images/1/1f/EMP_BDSI.pdfHuman-Centered Computing: Using Speech to Understand Behavior Emily](https://reader034.fdocuments.us/reader034/viewer/2022051809/6012d1a23ebb1213e565662c/html5/thumbnails/92.jpg)
Thanks!
Questions?
chai.eecs.umich.edu