Content based video summarization into object maps

44
by Manuel Martos Asensio directed by Horst Eidenberger and Xavier Giro-i-Nieto

description

Master thesis defence by Manuel Martos-Asensio Advisors: Horst Eidenberger (Technische Universtität Viena) and Xavier Giró-i-Nieto (Universitat Politècnica de Catalunya) More details

Transcript of Content based video summarization into object maps

Page 1: Content based video summarization into object maps

by Manuel Martos Asensio

directed byHorst Eidenberger

andXavier Giro-i-Nieto

Page 2: Content based video summarization into object maps

Introduction (I)

Page 3: Content based video summarization into object maps

Introduction (II)

Page 4: Content based video summarization into object maps

Introduction (III)

Page 5: Content based video summarization into object maps

Contents

System overview

Requirements analysis

SolutionPreparationContent selectionCompositing

ConclusionsExperimental resultsFurther work

Page 6: Content based video summarization into object maps

Contents

System overview

Requirements analysis

Solution approachPreparationContent selectionCompositing

ConclusionsExperimental resultsFurther work

Page 7: Content based video summarization into object maps

System overview

Page 8: Content based video summarization into object maps

Contents

System overview

Requirements analysis

Solution approachPreparationContent selectionCompositing

ConclusionsExperimental resultsFurther work

Page 9: Content based video summarization into object maps

Requirements analysis

Priority requirements P.1. People and main characters

P.2. Fast understanding

P.3. Visual variability

Uniqueness requirements U.1. Non-repetition

U.2. Visual uniqueness

U.3. Characters uniqueness

Page 10: Content based video summarization into object maps

Requirements analysis

Structural requirements S.1. Main characters highlight

S.2. Style

Navigability requirements N.1. Region boundaries

N.2. Metadata supplement

Page 11: Content based video summarization into object maps

Requirements analysis

Page 12: Content based video summarization into object maps

Contents

System overview

Requirements analysis

Solution approachPreparation

Uniform samplingShot boundary detection

Content selectionCompositing

ConclusionsExperimental resultsFurther work

Page 13: Content based video summarization into object maps

Preparation (I)

Uniform sampling

fpsi = acquisition frame rate

N0 = number of samples

Li = video length (in frames)

Page 14: Content based video summarization into object maps

Preparation (II)

Shot boundary detection

Customizable method for boundary detection

Default: Cumulative Pixel-to-Pixel

Page 15: Content based video summarization into object maps

Contents

System overview

Requirements analysis

Solution approachPreparationContent selection

Face detectionFace clusteringObject detection

Compositing

ConclusionsExperimental resultsFurther work

Page 16: Content based video summarization into object maps

Content selection (I)

Face detectionProblems:

Extreme size detections

Overlapping detections

Page 17: Content based video summarization into object maps

Content selection (II)

Face detection

Page 18: Content based video summarization into object maps

Content selection (III)

Face detection

Size filtering with fixed threshold

Page 19: Content based video summarization into object maps

Content selection (IV)

Face detection

Overlap filtering

Frontal detections are more reliable.

Page 20: Content based video summarization into object maps

Content selection (V)

Face detection

Page 21: Content based video summarization into object maps

Content selection (VI)

Face clustering

Which faces belong to the same person?

Which faces appear more often in the video?

Unsupervised Face Clustering problem:

1. Unknown number of characters

2. Unknown ground truth

Solution:

Iterative cluster estimation using LBPH

Page 22: Content based video summarization into object maps

Content selection (VII)

Face clustering

Pre-processing of face detection boxes

Page 23: Content based video summarization into object maps

Content selection (VIII)

Face clustering

Iterative face labeling

Page 24: Content based video summarization into object maps

Content selection (IX)

Face clustering

Page 25: Content based video summarization into object maps

Content selection (X)

Face clustering

Page 26: Content based video summarization into object maps

Content selection (XI)

Object detectionRelevant content is related to source video

Custom object map with:

1. Haar cascades

2. SURF descriptors matching

3. Deformable parts models

Page 27: Content based video summarization into object maps

Content selection (XII)

Object detectionHaar cascade classifiers

Advantages:

- Quick object detection

- Training and detection stages included in OpenCV

Disadvantages:

- Fails at giving good results with different object views

- Slow training process

Page 28: Content based video summarization into object maps

Content selection (XIII)

Object detectionSURF descriptors matching

Advantages:- No additional training stage needed- Scale and rotation invariant method- Real-time object detection- Descriptors extraction and matching strategy included in OpenCV

Disadvantages:- Very specific training image- Object may not be located in the image

Page 29: Content based video summarization into object maps

Content selection (XIV)

Object detectionDeformable parts models

Advantages:

- Multiple object views detection

- Scored results

Disadvantages:

- Third party executable wrapped in Java

- Slow object detection process

Page 30: Content based video summarization into object maps

Contents

System overview

Requirements analysis

Solution approachPreparationContent selectionCompositing

ConclusionsExperimental resultsFurther work

Page 31: Content based video summarization into object maps

Compositing (I)

Object segmentation

Page 32: Content based video summarization into object maps

Compositing (II)

Tile-based map

Adaptative map

Navigation functionalities

Page 33: Content based video summarization into object maps

Contents

System overview

Requirements analysis

Solution approachPreparationContent selectionCompositing

ConclusionsExperimental resultsFurther work

Page 34: Content based video summarization into object maps

Conclusions (I)

Experimental results

Web-based survey:

13 trailers

53 participants

Control methods:

Baseline: Uniform sampling

Upper bound: Manual frame selection

Page 35: Content based video summarization into object maps

Conclusions (III)

Experimental resultsOverall rating

Recognition Rate

Attractiveness and effectiveness

Scores1 (Unacceptable), 2 (Fair), 3 (Good), 4 (Very good), 5 (Excellent)

Page 36: Content based video summarization into object maps

Conclusions (II)

Experimental resultsOverall rating

0

1

2

3

4

5

1 2 3 4 5 6 7 8 9 10 11 12 13

sco

re

trailer id

MOS for video

Uniform sampling

Object map

Manual selection

0

1

2

3

4

5

sco

re

MOS

Page 37: Content based video summarization into object maps

Conclusions (III)

Experimental results

Trailer 1: The Intouchables

Uniform sampled Object map

Page 38: Content based video summarization into object maps

Conclusions (III)

Experimental results

Trailer 7: The Fast and the Furious

Object map

Page 39: Content based video summarization into object maps

Conclusions (IV)

Experimental resultsMovie recognition

a) Uniform sampling

b) Uniform sampling + Object map

c) Uniform sampling + Object map + Manual selection

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13

reco

gn

itio

n r

ate

(%

)

trailer id

Recognition Rate for video

a

b

c

0

20

40

60

80

100

reco

gn

itio

n r

ate

(%

)

Recognition rate

Page 40: Content based video summarization into object maps

Conclusions (III)

Experimental results

Trailer 4: Dark Shadows

Trailer 9: Resident Evil 5 – Retribution

Uniform sampled Uniform sampled

Page 41: Content based video summarization into object maps

Conclusions (V)

Experimental resultsAttractiveness and Effectiveness

0

1

2

3

4

5

1 2 3 4 5 6 7 8 9 10 11 12 13

sco

re

trailer id

Acceptance rate

Attractiveness

Effectiveness

0

1

2

3

4

5

sco

re

Average acceptancerate

Page 42: Content based video summarization into object maps

Conclusions (III)

Content-based video summarization application

Customizable

Allows to rapidly grasp video content

Generates a summary description file to include related metadata

ACM 2013 Open Source Software Competition

Code publicly available at Sourceforge

http://sourceforge.net/p/objectmaps

Page 43: Content based video summarization into object maps

Conclusions (VI)

Further work

Face clustering improvement

Audio content analysis and understanding

Video sequence analysis

Content presentation analysis

Social Media

Page 44: Content based video summarization into object maps