Content based video summarization into object maps

by Manuel Martos Asensio

directed byHorst Eidenberger

andXavier Giro-i-Nieto

Introduction (I)

Introduction (II)

Introduction (III)

Contents

System overview

Requirements analysis

SolutionPreparationContent selectionCompositing

ConclusionsExperimental resultsFurther work

Contents

System overview


Solution approachPreparationContent selectionCompositing


System overview

Contents

System overview





Priority requirements P.1. People and main characters

P.2. Fast understanding

P.3. Visual variability

Uniqueness requirements U.1. Non-repetition

U.2. Visual uniqueness

U.3. Characters uniqueness


Structural requirements S.1. Main characters highlight

S.2. Style

Navigability requirements N.1. Region boundaries

N.2. Metadata supplement

Contents

System overview


Solution approachPreparation

Uniform samplingShot boundary detection

Content selectionCompositing


Preparation (I)

Uniform sampling

fpsi = acquisition frame rate

N0 = number of samples

Li = video length (in frames)

Preparation (II)

Shot boundary detection

Customizable method for boundary detection

Default: Cumulative Pixel-to-Pixel

Contents

System overview


Solution approachPreparationContent selection

Face detectionFace clusteringObject detection

Compositing


Content selection (I)

Face detectionProblems:

Extreme size detections

Overlapping detections

Content selection (II)

Face detection

Content selection (III)

Face detection

Size filtering with fixed threshold

Content selection (IV)

Face detection

Overlap filtering

Frontal detections are more reliable.

Content selection (V)

Face detection

Content selection (VI)

Face clustering

Which faces belong to the same person?

Which faces appear more often in the video?

Unsupervised Face Clustering problem:

1. Unknown number of characters

2. Unknown ground truth

Solution:

Iterative cluster estimation using LBPH

Content selection (VII)

Face clustering

Pre-processing of face detection boxes

Content selection (VIII)

Face clustering

Iterative face labeling

Content selection (IX)

Face clustering

Content selection (X)

Face clustering

Content selection (XI)

Object detectionRelevant content is related to source video

Custom object map with:

1. Haar cascades

2. SURF descriptors matching

3. Deformable parts models

Content selection (XII)

Object detectionHaar cascade classifiers

Advantages:

- Quick object detection

- Training and detection stages included in OpenCV

Disadvantages:

- Fails at giving good results with different object views

- Slow training process

Content selection (XIII)

Object detectionSURF descriptors matching

Advantages:- No additional training stage needed- Scale and rotation invariant method- Real-time object detection- Descriptors extraction and matching strategy included in OpenCV

Disadvantages:- Very specific training image- Object may not be located in the image

Content selection (XIV)

Object detectionDeformable parts models

Advantages:

- Multiple object views detection

- Scored results

Disadvantages:

- Third party executable wrapped in Java

- Slow object detection process

Contents

System overview




Compositing (I)

Object segmentation

Compositing (II)

Tile-based map

Adaptative map

Navigation functionalities

Contents

System overview




Conclusions (I)

Experimental results

Web-based survey:

13 trailers

53 participants

Control methods:

Baseline: Uniform sampling

Upper bound: Manual frame selection

Conclusions (III)

Experimental resultsOverall rating

Recognition Rate

Attractiveness and effectiveness

Scores1 (Unacceptable), 2 (Fair), 3 (Good), 4 (Very good), 5 (Excellent)

Conclusions (II)

Experimental resultsOverall rating

0

1

2

3

4

5

1 2 3 4 5 6 7 8 9 10 11 12 13

sco

re

trailer id

MOS for video

Uniform sampling

Object map

Manual selection

0

1

2

3

4

5

sco

re

MOS

Conclusions (III)


Trailer 1: The Intouchables

Uniform sampled Object map

Conclusions (III)


Trailer 7: The Fast and the Furious

Object map

Conclusions (IV)

Experimental resultsMovie recognition

a) Uniform sampling

b) Uniform sampling + Object map

c) Uniform sampling + Object map + Manual selection

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13

reco

gn

itio

n r

ate

(%

)

trailer id

Recognition Rate for video

a

b

c

0

20

40

60

80

100

reco

gn

itio

n r

ate

(%

)

Recognition rate

Conclusions (III)


Trailer 4: Dark Shadows

Trailer 9: Resident Evil 5 – Retribution

Uniform sampled Uniform sampled

Conclusions (V)

Experimental resultsAttractiveness and Effectiveness

0

1

2

3

4

5

1 2 3 4 5 6 7 8 9 10 11 12 13

sco

re

trailer id

Acceptance rate

Attractiveness

Effectiveness

0

1

2

3

4

5

sco

re

Average acceptancerate

Conclusions (III)

Content-based video summarization application

Customizable

Allows to rapidly grasp video content

Generates a summary description file to include related metadata

ACM 2013 Open Source Software Competition

Code publicly available at Sourceforge

http://sourceforge.net/p/objectmaps

Conclusions (VI)

Further work

Face clustering improvement

Audio content analysis and understanding

Video sequence analysis

Content presentation analysis

Social Media

Content based video summarization into object maps

Technology

Transcript of Content based video summarization into object maps