Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

32
Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

Transcript of Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Page 1: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Bangpeng Yao Li Fei-Fei

Computer Science Department, Stanford University, USA

Modeling Mutual Context of Object and Human Pose

in Human-Object Interaction Activities

Page 2: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human

pose estimationExperimentsConclusion

Outline

Page 3: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human

pose estimationExperimentsConclusion

Outline

Page 4: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Human pose estimation & Object detection

Introduction

Right-arm

Left-arm

Torso

Right-leg

Left-leg

Tennisracket

Page 5: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Challenging:

Introduction

Page 6: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Mutual context:Human pose estimation & Object detection - facilitate the recognition of each other

Introduction

Page 7: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Mutual context V.S no mutual context

Introduction

Page 8: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human

pose estimationExperimentsConclusion

Outline

Page 9: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

HOI activity

Page 10: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

A: Activity class, ex : tennis server, volleyball smash

O:Object, ex : tennis racket, volleyball

H:Human pose

P: Body partsf: visual feature

Each A have more than one type of H

HOI activity

Page 11: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

: edge of the model : potential function

: weight : Freguencies of

co-occurrence between A, O, and H , , : Spatial

relationship among object and body parts, compute by

: (position, orientation, scale)

The model

Page 12: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

: model the dependence of the object and a body part with their corresponding image evidence

The model

Page 13: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Co-occurrence context for the activity class, object, and human pose

Multiple types of human pose for each activity

Spatial context between object and body parts

Properties of the model

Page 14: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human

pose estimationExperimentsConclusion

Outline

Page 15: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Learning step needs to achieve two goals:structure learning & parameter

estimation

Structure learning: discover the hidden human pose and the connectivity among the object, human pose, and body parts

Parameter estimation: for the potential weight to maximize the discrimination between different activities

Model learning

Page 16: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Objective: Connectivity pattern between the object, the human pose, and the body parts

Method: hill-climbing approach with tabu list

Structure learning

Page 17: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Hill-climbing approach adds or removes edges one at a time until maximum is reached

Hill-climbing structure learning

Humanpose

Page 18: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Objective: obtain a set of potential weight that maximize the discrimination between different classes of activities

Training sample : : is potential function value, disconnected edge set 0

: is the human pose H : is the class label AIf , then

: is a weight vector for the r-th sub-class

Max-margin parameter estimation

Page 19: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

: is L2 norm : normalization constant

Multiclass SVM

Page 20: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Using only one human pose for each HOI class is not enough to characterize well all the image in this class

Analysis of our learning algorithm

Page 21: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human

pose estimationExperimentsConclusion

Outline

Page 22: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Given a new testing image, our objective is : - estimate the pose of the human- detect the object that is interacting with the human

Model inference, object detection, and human pose estimation

Page 23: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human

pose estimationExperimentsConclusion

Outline

Page 24: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Cricket - defensive shot (player and cricket bat)

Cricket - bowling (player and cricket ball)Croquet - shot (player and croquet mallet)Tennis - forehand (player and tennis racket)Tennis – serve (player and tennis racket)Volleyball - smash (player and volleyball)

30 images for training, 20 for testing

The sports dataset

Page 25: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Better object detection

Page 26: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Sliding window Pedestrian as context Our method

detector

Better object detection

Page 27: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Pose estimation still difficult

Multiple pose is better than only one pose

Better pose estimation

Page 28: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Upper: our methodLower left: object detection by a scanning

windowLower right: pose estimation by the state-of-

art pictorial structure method

Page 29: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Note Gupta et.al. uses predominantly the background scene context

Combining object and pose for HOI activity classification

Page 30: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

IntroductionModeling mutual context of object and poseModel learningModel inference, object detection, and human

pose estimationExperimentsConclusion

Outline

Page 31: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Treat object and human pose as the context of each other in different HOI activity classes

Structure learning method - connectivity important patterns between objects and human pose

Further improve : - incorporate useful background scene context to facilitate the recognition of foreground object and

activity- deal with more than one object

Conclusion

Page 32: Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Thanks!!!