The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... ·...

25
@affectiva The Future of the In-Car Experience Abdelrahman Mahmoud Product Manager Ashutosh Sanan Computer Vision Scientist

Transcript of The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... ·...

Page 1: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

The Future of the In-Car Experience

Abdelrahman Mahmoud Product Manager

Ashutosh Sanan Computer Vision Scientist

Page 2: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Affectiva Emotion AI

Interviewing Mood Tracking Social RobotsDrug Efficacy

Banking

Content Management (video / audio)

Focus Groups

Customer Analytics

Education Surveillance

Telehealth Academic Research

Connected devices / loT Health & Wellness

Social RoboticsMOOCs

Recruiting Market Research Legal

Mental health

Web Conferencing HealthcareReal time student feedback

Video & Photo organization AutomotiveFraud Detection

Retail

Virtual Assistants Online education Gaming

Live streaming

Telemedicine Security

In market products since 2011 • 1/3 of Fortune Global 100, 1400 brands • OEMs and Tier I suppliers

Emotion recognition from face and voice powers several industries

Built using real-world data • 6.5M face videos from 87 countries • 42,000 miles of driving quarterly

Recognized Market / AI Leader • Spun out of MIT Media Lab • Selected for Startup Autobahn and Partnership on AI

Page 3: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Affectiva Automotive AI

The Problem Affectiva Solution

Driver Safety

Transitions in control in semi-autonomous vehicles

(e.g. the L3 handoff problem)

Current solutions based on steering wheel sensors

are irrelevant in autonomous driving

Next generation AI based system to monitor and manage driver capability

for safe engagement

Occupant Experience

Differentiated and monetizable in-cab experience (e.g. the L4 luxury

car challenge)

First in-market solution for understanding occupant state

and mood to enhance overall in-cab experience

Page 4: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

External Context Weather Traffic Signs Pedestrians

People Analytics

Personal Context Identity Likes/dislikes & preferences Occupant state history Calendar

In-Cab Context Occupant relationshipsInfotainment content Inanimate objects Cabin environment

Emotion AIFacial expressions

Tone of voice Body posture

People Analytics

Anger Surprise

Distraction Drowsiness Intoxication

Cognitive Load

Enjoyment Attention

Excitement Stress

Discomfort Displeasure

People Analytics context-aware with Emotion AI as the foundational technology. 

Personalization Individually customized baseline Adaptive environment Personalization across vehicles

Safety Next generation driver monitoring Smart handoff Proactive intervention

Monetization Differentiation among brands Premium content delivery Purchase recommendations

Page 5: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Affectiva approach to addressing Emotion AI complexities

Data

Our robust and scalable data strategy enables us to acquire large and diverse data sets, annotate these

using manual and automated approaches.

Algorithms

Using a variety of deep learning, computer vision and speech processing approaches, we have

developed algorithms to model complex and

nuanced emotion and cognitive states.

Team

Our team of researchers and technologists have deep

expertise in machine learning, deep learning, data

science, data annotation, computer vision and speech

processing

Infrastructure

Deep learning infrastructure allows for rapid

experimentation and tuning of models as wells as large scale data processing and

model evaluation.

Page 6: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

World’s largest emotion data repository87 countries, 6.5M faces analyzed, 3.8B facial frames Includes people emoting on device, and while driving

Top Countries for Emotion Data

USA1,166K

MEXICO150K

BRAZIL194K

GERMANY148K

UNITED KINGDOM265K CHINA

562KJAPAN61K

VIETNAM148KPHILIPPINES159K

INDONESIA325K

THAILAND184K

INDIA1,363K

Page 7: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Data StrategyTo develop a deep understanding of the state of occupants in a car, one needs large amounts of data. With this data we can develop algorithms that can sense emotions and gather people analytics in real world conditions.

Foundational proprietary data will drive value to accelerate data partner ecosystem

Spontaneous occupant data

Using Affectiva Driver Kits and Affectiva Moving Labs to collect naturalistic driver and occupant data to develop metrics that are robust to

real-world conditions

Data partnershipsAcquire 3rd party natural in-cab data through academic and commercial partners (MIT AVT, fleet operators, ride-share companies)

Simulated data

Collect challenging data in safe lab simulation environment to augment the spontaneous driver dataset and bootstrap algorithms (e.g. drowsiness, intoxication) multi-spectral & transfer learning.

Auto Data Corpus

Affectiva Confidential

Page 8: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Page 9: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Affectiva approach to addressing Emotion AI complexities

Data

Our robust and scalable data strategy enables us to acquire large and diverse data sets, annotate these

using manual and automated approaches.

Algorithms

Using a variety of deep learning, computer vision and speech processing approaches, we have

developed algorithms to model complex and

nuanced emotion and cognitive states.

Team

Our team of researchers and technologists have deep

expertise in machine learning, deep learning, data

science, data annotation, computer vision and speech

processing

Infrastructure

Deep learning infrastructure allows for rapid

experimentation and tuning of models as wells as large scale data processing and

model evaluation.

Page 10: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Algorithms

Page 11: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Deep learning advancements driving the automotive roadmap The current SDK consists of deep learning networks that: • Face detection: given an image, detect faces • Landmark localization: given a image + bounding box, detect and track landmarks • Facial analysis: detect facial expression/emotion/attributes

Face detection(RPN + bounding boxes)

image

Landmark localization (Regression + confidence)

Facial analysis(Multi-task CNN/RNN)

face image

per face analysis

bounding boxes

Region Proposal Network

Shared Conv.

Shared Conv.

Shared Conv.

Classification

Landmarkestimate

Landmarkrefinement

ConfidenceEmotions

Temporal Expressions

Attributes

Page 12: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Task: Facial Action/Emotion Recognition

• Given a face classify the corresponding visual expression/emotion occurrence.

• Many Expressions: Facial muscles generate hundreds of facial expressions/emotions.

• Multi-Attribute Classification

• Fast enough to run on mobile/embedded devices.

Joy

Yawn

Eye Brow Raise

Page 13: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Is a single image always enough?

Giphy

Page 14: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Information in Time

Emotional state continuously evolving process over time.

Adding temporal information makes it easier to detect highly subtle changes in facial state.

How to utilize temporal information • Use post-processing based over static classifier output using previous predictions and images. • Use Recurrent Architectures.

Inte

nsity

of E

xpre

ssio

n

0

26

53

79

105

1 2 3 4 5 6 7 8 9

TIME

Page 15: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Spatio-Temporal Action Recognition

CNN

CNN

CNN

L S T M

Temporal Sequence of Frames

Spatial Feature Extraction

0

0

0.5 , , , ,

0.8Learning temporal

structure

Frame Level Classification

Yawn Recognition using CNN + LSTM

Page 16: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Training Challenges & Inferences

Page 17: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Data challenges

While training RNN’s expect a continuous temporal sequence.

Missing facial frames • Bad lighting • Face out of view • Face not visible

Possible Solutions• Use shorter and fixed continuous sequences with no missing data • Copy the last state of the sequence. Repeat last tracked frame • Mask the missing frames

Missing human annotations Facial frames not labeled by humans

Missing Frames in Sequence

Page 18: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Masking vs Copying last stateResults indicate that masking works better than copying the last state

Chart Title

0.94

0.948

0.955

0.963

0.97

ROC-AUC Val Acc

Using last state Masking

Page 19: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

ExpressionsInput A

Input B

Frozen Feature Extractors

Yawn

Transfer

Two approaches to train our model:

• Train both convolution and recurrent filters jointly.

• Transfer learning using previously learned convolutional filters.

How to train a Spatio-Temporal model?

Page 20: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Transfer learning for runtime performance

Chart Title

0.961

0.963

0.966

0.968

0.97

ROC-AUC Val Acc

Fixed Weights Fully Trainable

Shared Conv.

Emotions

Temporal Expressions

Attributes

Intelligent Filter Reuse

• Increased runtime performance to run on mobile.

• Minimal benefit by tuning filters from scratch.

• Large real-world dataset for pretrained filters.

Usage of transfer learning to help with the runtime performance

Page 21: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Yawn ROC-AUC Performance (Temporal vs Static)

0.93

0.94

0.95

0.96

0.97

Static Temporal

Smile ROC-AUC Performance (Temporal vs Static)

0.93

0.938

0.946

0.954

0.962

Static Temporal

Outer Brow Raiser-AU02 ROC-AUC Performance (Temporal vs Static)

0.86

0.868

0.875

0.883

0.89

Static Temporal

Does temporal info always help?

Page 22: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Models in Action

Page 23: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Key Takeaways

• Not all the metrics are benefited by adding complex temporal information

• Using all the data (complete & partial sequences) definitely helps the model

• Masking works better with partial sequences than copying last frames

• Intelligent filters reuse makes it possible to deploy these models on mobile with real-time performance

Page 24: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

What’s next ?

@affectiva

ANGER 100.00

JOY 0.00SMILE 0.00EXPRESSIVENESS 92.00

FATIGUE 98.00

EYE CLOSURE 100.00

SMILE 0.00EXPRESSIVENESS 57.00

CONCENTRATION 85.00FEAR 78.00JOY 0.00EXPRESSIVENESS 68.00

• Analyze the effects of difference in frame rate at deployment vs training.

• Use facial markers to create a drowsiness intensity metric.

Page 25: The Future of the In-Car Experience - NVIDIAon-demand.gputechconf.com/gtc/2018/presentation/s... · Team Our team of researchers and technologists have deep expertise in machine learning,

@affectiva

Q&A

Learn more at affectiva.com