Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori...

38
Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering, Kobe University, Japan The 14 th International MultiMedia Modeling Conference (MMM2008) 9-11 January, 2008 Kyoto University, Japan

Transcript of Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori...

Page 1: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

Tagging Video Contents with Positive/Negative Interest

Based on User’s Facial Expression

Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki

Graduate School of Engineering, Kobe University, Japan

The 14th International MultiMedia Modeling Conference (MMM2008)

9-11 January, 2008 Kyoto University, Japan

Page 2: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

2

Table of Contents

1. Motivation

2. Proposed System

3. Experiments

4. Conclusion and Future Work

Page 3: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

3

Introduction

Multichannel digital broadcasting on TV Video-sharing sites on the Internet

There are too many videos for viewers to select

Video contents recommendation system

  ⇒  Tagging video contents

Page 4: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

4

Video contents recommendation system

( User analysis)Remote control operation histories[1]

Favorite keywords[2]

Facial expression[3]

( Content analysis)Motion vector

Color distribution

Object recognition

Tagged contentsdatabase

[1]2001,Taka [2]2001,Masumitsu [3]2006,Yamamoto [4]1994, Resnick

( Contents recommendation )Collaborative Filtering[4]

( Contents recommendation )Collaborative Filtering[4]

Page 5: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

5

Main contribution

⇒Estimate Neutral, Positive or Negative

⇒Reject noisy frames automatically

Conventional tagging system based on a viewer’s facial expression [2006,Yamamoto]

Estimate Interest or Neutral

Simple facial feature points

Noisy frame (tilting the face, occlusion)

⇒Allocate more feature points ⇒

reject

⇒extract facial feature points by EBGM

Page 6: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

6

Experimental environment

User

Webcam

Display

PC

The viewer looks at the display alone

The viewer’s face is recorded into video by webcam

A PC plays back the video and also analyzes the viewer’s face

Top view of experimental environment

Page 7: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

7

Overview of proposed system

Personal model

EBGM

SVM

Neutral image

Personal facial expression classifier

Person recognition

AdaBoostFacial expression

recognition

・ Neutral

・ Positive

・Negative

・Rejective

Tag

Facial feature point extraction

Page 8: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

8

Face regions extraction using AdaBoost

Extract face regions using AdaBoost based on Haar-like feature [2001,Viola]

The face size is normalized

Reduce computation time in the next process

merits

Page 9: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

9

Facial feature point extraction and person recognition using EBGM

A jet is a set of convolution coefficients obtained by applying Gabor kernels with different frequencies and orientations to a point in an image

A set of jets extracted at all facial feature points is called a face graph. A bunch of face graph extracted from many people is called a bunch graph

Compute the similarity between Bunch Graph and Face Graph for searching facial feature points and person recognition

Jet Bunch GraphGabor Wavelet[1997,Wiskott]

Page 10: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

10

Facial expression recognition using SVM

A viewer registers the personal model in advance Neutral image Personal facial expression SVM classifier

After EBGM recognizes a viewer, the system retrieves the personal model

Feature vector for SVM

Neutral (A) Positive (B)

Page 11: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

11

Definition of Facial Expression Classes

Classes Meanings

Neutral (Neu) Expressionless

Positive (Pos) Happiness, Laughter, Pleasure, etc.

Negative (Neg) Anger, Disgust, Displeasure, etc.

Rejective (Rej)

Not watching the display

in the front direction,

Occluding part of face,

Tilting the face, etc.

Page 12: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

12

Example of “Neutral”

Page 13: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

13

Example of “Positive”

Page 14: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

14

Example of “Negative”

Page 15: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

15

Example of “Rejective”

Page 16: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

16

Automatic rejection

If the frame not extracted face region in, the frame is tagged with “Rejective”

If the frame extracted face region in, the frame is used by training and testing data for SVM

Face region extraction

SVM

Neutral

Positive

Negative

Rejective

Yes

No

Page 17: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

17

Experimental conditions

Two subject (A,B) watched four videos. The length of them was 17 minutes in average The categories were “variety shows” System recorded their facial video in

synchronization with the video content at 15fps After that, subjects tagged the video with four labels

Page 18: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

18

Tagged results (manual)

Neu Pos Neg Rej Total

(frame)

Subject A 49865 7665 3719 1466 62715

Subject B 56531 2347 3105 775 62758

We used those experimental videos and the tagging data as training and testing data in the next SVM experiments

Page 19: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

19

Preliminary experiment 1

Personal model

EBGM

SVM

Neutral image

Personal facial expression classifier

Person recognition

AdaBoostFacial expression

recognition

・ Neutral

・ Positive

・Negative

・Rejective

Tag

Facial feature point extraction

Experiment for the performance of face regions extraction using AdaBoost

Page 20: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

20

Preliminary experiment 1Face regions extraction using AdaBoost

Subject A Neu Pos Neg

False extraction 20 3 1

Total frames 49865 7665 3719

Rate (%) 0.040 0.039 0.027

Subject B Neu Pos Neg

False extraction 132 106 9

Total frames 56531 2347 3105

Rate (%) 0.234 4.516 0.290

Page 21: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

21

Preliminary experiment 2

Personal model

EBGM

SVM

Neutral image

Personal facial expression classifier

Person recognition

AdaBoostFacial expression

recognition

・ Neutral

・ Positive

・Negative

・Rejective

Tag

Facial feature point extraction

Experiment for the performance of person recognition using EBGM

Page 22: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

22

Preliminary experiment 2Person recognition using EBGM

  Subject A Neu Pos Neg

False recognition 2 0 0

Total frames 49845 7662 3718

Rate(%) 0.004 0.000 0.000

Subject B Neu Pos Neg

False recognition 2 20 0

Total frames 56399 2241 3096

Rate(%) 0.004 0.893 0.000

Page 23: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

23

Experiment

Personal model

EBGM

SVM

Neutral image

Personal facial expression classifier

Person recognition

AdaBoostFacial expression

recognition

・ Neutral

・ Positive

・Negative

・Rejective

Tag

Facial feature point extraction

Experiment for the performance of facial expression recognition using SVM

Page 24: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

24

ExperimentFacial expression recognition using SVM

Three of four experimental videos (11,000sec*15fps*3videos 500,000frames) were ≒used for training data, and the rest for testing data cross validation

Precision and recall rate were computed by the followings

Page 25: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

25

0.74

0.880.930.98

0.810.820.89

0.98

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Neu Pos Neg Rej

precisionrecall

Experimental resultsFacial expression recognition using SVM

Page 26: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

26

Discussion

The averaged recall rate was 87% and the averaged precision rate was 88%.

When the subjects modestly expressed their emotion, the system often mistook the facial expression for Neutral.

When the subjects had an intermediate facial expression, the system often made a mistake because one expression class was only assumed in a frame.

Page 27: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

27

Demo

Page 28: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

28

Conclusion and future work

The evaluation of various categories and various subjects

Combination of user analysis and content analysis

Construct a automatic video content recommendation system

We proposed the system that tagged video contents with Neutral, Positive, Negative, Rejective labels based on viewer’s facial expression.

Page 29: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

29

-

Page 30: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

30

Experimental ResultConfusion matrix - subject A

Subject A  Neu Pos Neg Rej Sum Recall(%)

Neu 48275 443 525 622 49865 96.81

Pos 743 6907 1 14 7665 90.11

Neg 356 107 3250 6 3719 87.39

Rej 135 0 5 1326 1466 90.45

Sum 49509 7457 3781 1968 62715 91.19

Precision(%) 97.51 92.62 85.96 67.38 85.87  

Page 31: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

31

  Subject B Neu Pos Neg Rej Sum Recall(%)

Neu 56068 138 264 61 56531 99.18

Pos 231 2076 8 32 2347 88.45

Neg 641 24 2402 38 3105 77.36

Rej 203 0 21 551 775 71.10

Sum 57143 2238 2695 682 62758 84.02

Precision(%) 98.12 92.76 89.13 80.79 90.20  

Experimental ResultConfusion matrix - subject B

Page 32: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

32

Computation time

Recording facial video (15fps)

Processing facial video Pentium4 3GHz (1fps) Core2 Quad 2.4GHz (4fps)

If a user watches video content within about 6 hours per day, a system(Core2 Quad) can complete the process

Page 33: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

33

Conventional system VS propose system

The video data used to evaluate the conventional system is not released, so we can’t make an easy comparison.

Page 34: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

34

Learning in advance

Current proposed system The viewer’s neutral image His personal facial expression classifier

• He watches video content (for 50 minutes)• After that, he tags the video content

Future work (Saving user’s trouble) Semi-supervised or unsupervised learning

Page 35: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

35

The video categories

Experiment in this study Only “variety shows”

Future work “dramas”, ”news”, etc.

But I expect our system not to work well because a viewer’s facial expression is not very changing when he watches such video

Page 36: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

36

Conventional facial expression recognition

About facial expression recognition, many studies have been carried out.

So, we also have to consider other methods

Page 37: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

37

Evaluation

The value of experimental results (Averaged recall for subject A + Averaged recall for subject B)/2 (Averaged precision for subject A + Averaged precision for subject B)/2

Page 38: Tagging Video Contents with Positive/Negative Interest Based on User’s Facial Expression Masanori Miyahara, Masaki Aoki, Tetsuya Takiguchi, Yasuo Ariki.

38

Q and A リアルタイム性について 特徴点を増やす意味は 事前学習 バラエティだけでいいのか? コンテンツ推薦はどうやってやるのか なぜ EBGMか? AAMは? ホラー映画 エクマンの 6表情 タギングはいつするの? テレビにカメラはついてないが ボタンをおせばいいのでは? 動いていないときは処理しないようにすれば処理時間

短縮できる 毎フレームと区間、どっちがいいか キーワードや EPGだけで十分ではないか 顔を撮影されるのは心理的な抵抗がある ドラマをみて感動したってのはどこ? 手動タギングは被験者自身がやるが、他人がやったら

どうか 映像を 2回見ると影響はあるか Rejectiveの仕組みは? 顔表情の分野のほかの手法と比較してどうか 番組自動推薦をやろうとすると大量の被験者が必要と

なるが実現できるか?

2人で十分なのか 被験者が表情を強調あるいは抑制していないことをど

うやって保障するのか タギングするときには、コンテンツをみるのか、自分

の顔をみるのか。 表情が認識できなかったとき、どうやって番組を推薦

するのか。何かプランはあるか 技術的な新規性はなにか?既存の技術を組み合わせた

だけじゃないのか。 無表情画像と顔表情をあらかじめ登録するというのは、

大量のユーザがこのシステムを使うときには障壁とならないか。

スポーツや映画などではどうか。 実環境では、照明変動や顔の動きにどう対応するか TRECVIDとかはどう? 結論にメインコントリビューションが書かれていない 協調フィルタリングって?