People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time...

18
1 People Detection and Video Understanding Francois BREMOND INRIA Sophia Antipolis STARS team Institut National Recherche Informatique et Automatisme [email protected] http://www-sop.inria.fr/members/Francois.Bremond/ CoBTeK, Nice University Hospital Sophia Antipolis

Transcript of People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time...

Page 1: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

1

People Detection

and Video Understanding

Francois BREMOND

INRIA Sophia Antipolis – STARS team

Institut National Recherche Informatique et Automatisme

[email protected]

http://www-sop.inria.fr/members/Francois.Bremond/

CoBTeK,

Nice University Hospital

Sophia Antipolis

Page 2: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

2

Objective: Designing systems for Real time recognition of human activities observed by

various sensors (especially video cameras).

Challenge: Bridging the gap between numerical sensors and semantic events.

Approach: Spatio-temporal reasoning and knowledge management.

Examples of human activities:

for individuals (vandalism, bank attack, cooking, washing dishes, falling)

for small groups (fighting)

for crowd (overcrowding)

for interactions of people and vehicles (aircraft refueling)

Video Understanding

Page 3: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

3

3

Generic Platform for activity understanding

Person inside

Kitchen

Detection Classification TrackingActivity

Recognition

Posture

Recognition

Actions

Activity

Models

Page 4: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

4

Results

a background updating scheme was used for some sequences

People detection and tracking

Page 5: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

5

People detection : faster R-CNN on ETHZ

Page 6: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

6

Motivation - Action Recognition

Hollywood dataset

Page 7: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

7

Motivation - Action Recognition

UCF Sports dataset

Page 8: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

8

Motivation - Action Recognition

Daily Living datasets (Rochester Univ.)

Page 9: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

9Action Recognition using Bag of Words

M. Koperski

Videos Feature detector Feature descriptor BOW modelHistograms of

codewords Non-linear SVM

Codeword defined as a

Descriptor cluster

Page 10: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

10

Violence Recognition Framework, P. Bilinski

Feature Description

(TS, HOG, HOF, MBH)

Video Representation

(Improved Fisher Vectors)

Violence

Non-violence

Classification

(SVM)

Input Video Feature Detection

(Improved Dense Trajectories)

Trajectory Shape

HOG

HOF MBH

Pub Street Football Stadium

Steet School Movies Analysis

Football Stadium

Football Stadium

Violence Non-violence

Page 11: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

11

Gender recognition using smile:

A. Dantcheva

Spatio-temporal features based on dense trajectories

represented by a set of descriptors encoded by Fisher Vectors .

Page 12: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

12

Toyota Smart-Home

Large scale daily living dataset

Page 13: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

13

Toyota Smart-Home

Large scale daily living dataset

Page 14: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

14

Issues in Action Recognition using Deep CNNs

Deep Convolutional Neural Networks (CNN)

Images

• Large Annotated data (Imagenet)

• Architecture Suitable for Images with good resolution

Videos: How to capture motion information in CNN ?

• Stacking of frames

• Capture motion independently or not: several stream CNNs

• One ConvNet to capture static (frame based) visual information.

• Another ConvNet to capture motion information (like Optical Flow, but expensive)

• Other Nets to capture motion on longer scales or together (Siamese)

• Other Nets to capture object-ness.

C. Roberto de Souza, A. Gaidon, E. Vig, and A. Lopez. Sympathy for the Details: Dense

Trajectories and Hybrid Classification Architectures for Action Recognition, ECCV 2016

Page 15: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

15

Convolutional Pose Machines

for Action Recognition

Proposed a representation derived from human pose using Realtime Multi-Person 2D

Pose Estimation using Part Affinity Fields

The descriptor aggregates motion and appearance information along tracks of

human body parts using P-CNN : Pose-based CNN Features for Action Recognition

Page 16: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

16

Toyota Smart-Home

Large scale daily living dataset

Prepare tea

Pour water for tea

Page 17: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

17

Visualization of older adult performance while accomplishing the semi-guided tasks.

Activity monitoring in Greece Hospital with AD patients

Page 18: People Detection and Video Understanding€¦ · 2 Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras).

18

Conclusion - video understandingA global framework for building real-time video understanding systems:

Perspectives:

• Generate totally unsupervised models

• Use finer features as input for the algorithm (head, posture, facial gesture…)

• Generating language description for the activity models

• Generic activity models (cross scenes), Adaptive learning

• More semantics, emotion, mental states.

4 PhD open topics:

• Kontron: People Tracking using Deep Learning algorithms on embedded hardware

• ESI: People Re-Identification using Deep Learning

• Wildmoka: Video based Action Recognition using Deep Learning

• Nice Hospital: Uncertainty Management and Activity Recognition