Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for...

Post on 25-May-2020

8 views 0 download

Transcript of Understanding Human-Object Interaction in RGB-D videos for ...€¦ · Discriminative models for...

Zhiwen FangBeingTogether Centre,IMI, Research Fellow

1

Understanding Human-Object Interaction in RGB-D videos for Human Robot Interaction

Non-verbal language

2

MotivationHuman-robot interaction (HRI)[1,2,3]

[1] Yang Xiao, Zhijun Zhang, Aryel Beck, Junsong Yuan, and Daniel Thalmann. 2014. Human–robot interaction by understanding upper body gestures. Presence: teleoperators and virtual environments 23, 2 (2014), 133–154.[2] Isibor Kennedy Ihianle, Usman Naeem, and Abdel‐Rahman Tawil. 2016. Recognition of activities of daily living from topic model. Procedia Computer Science 98 (2016), 24–31.[3] Marina P′erez‐Jim′enez, Borja Bordel S′anchez, and Ram′on Alcarria. 2016. T4AI: A system for monitoring people based on improved wearable devices. Research Briefs on Information & Communication Technology Evolution (ReBICTE) 2 (2016), 1–16.

Human

Verbal language

Facial expression

body gesture

Object

Social robot

Motivation

Motivation

Understand the intention of the human based on the object information

with a cell phone in hand and close to ear, it may indicate

that the person is having a call.

with a cup in hand and close to mouse, it may indicate the

person is drinking.

How to detect hand-held objects?

1 Introduction

2 Method

4 Results

5

Outline

5

Conclusions

3 System overview

6

Wearable sensors & Radio Frequency Identification tags [1]

Thermal band images [2]

Computer vision method based on RGB camera [3][4]

[1] K. P. Fishkin, M. Philipose, and A. Rea. 2005. Hands-on RFID: wireless wearables for detecting use of objects. In IEEE International Symposium on Wearable Computers, 2005. Proceedings.38–43.[2] Cigdem Beyan and Alptekin Temizel. 2015. A multimodal approach for individual tracking of people and their belongings. The Imaging Science Journal 63, 4 (2015), 192–202.[3] Chaitanya Desai, Deva Ramanan, and Charless Fowlkes. 2010. Discriminative models for static human‐object interactions. In Computer vision and pattern recognition workshops (CVPRW), 2010 IEEE computer society conference on. IEEE, 9–16.[4] Zhaozhuo Xu, Yuan Tian, Xinjue Hu, and Fangling Pu. 2015. Dangerous human event understanding using human‐object interaction model. In Signal Processing, Communications and Computing (ICSPCC), 2015 IEEE International Conference on. IEEE, 1–5.

Introduction

7

Introduction

Research problems in hand-held object detection(1) Relationship between objects and a person

(2) Hand-held objects are often very small

(3) Targets loss because of appearance changes and/or part

occlusion in the sequence.

Chair, bottle, cell phone, keyboard… About 5 meters, bottle Part occlusion, cell phone

1 Introduction

2 Method

4 Results

5

Outline

8

Conclusions

3 System overview

9

Method

Human contextual information

1. Skeleton data (25 body joint positions)

2. Local patch around the hand joint

10RGB image Person Index

Estimate the probability of belonging to a person1. Object Detection in the local patch

2. Estimate the probability using the person index map

Method

11

Estimate the probability of belonging to a person

Method

12

Object detection in a local patch by YOLO[1, 2]

(1) resize the image to 544 * 544

(2) run a convolutional network on the resized image

(3) output the results by the confidence of network model.

[1] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.[2] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[J]. arXiv preprint, 2017.

Method

13

Method

14

Object tracking based on correlation filter [1]

(1) dense sampling by modeling all possible translations of the

base sample in a search window as circulant shifts

(2) learning the correlation filter by solving a ridge regression

problem in the Fourier domain.[1] Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596.

Method

1 Introduction

2 Method

4 Results

5

Outline

15

Conclusions

3 System overview

16

Natural Language Processing

Natural Language Processing

Speech recognition

Natural Language Processing

Hand‐held object detection Object detection

Human and robot interaction

Language interaction

Object exchange

System overview

1 Introduction

2 Method

4 Results

5

Outline

17

Conclusions

3 System overview

18

Results

Detection rate of different methods in three categories (i.e. bottle, cup, cell phone).

* w/o represents the method without human contextual information

1 Introduction & Literature Review

2 Method

4 Results

5

Outline

19

Conclusions

3 System overview

20

Conclusions

To provide intelligent human-robot interaction, it is critical to

understand the interaction between the human and daily objects,

so that we can analyze the intention of the human.

Using a RGB-D sensor, we can provide a method to detect

hand-held objects

Human contextual information is introduced to improve the

performance of hand-held object detection

THANK YOU!

21

Q & A