Content based video summarization into object maps
-
Upload
xavier-giro -
Category
Technology
-
view
301 -
download
0
description
Transcript of Content based video summarization into object maps
by Manuel Martos Asensio
directed byHorst Eidenberger
andXavier Giro-i-Nieto
Introduction (I)
Introduction (II)
Introduction (III)
Contents
System overview
Requirements analysis
SolutionPreparationContent selectionCompositing
ConclusionsExperimental resultsFurther work
Contents
System overview
Requirements analysis
Solution approachPreparationContent selectionCompositing
ConclusionsExperimental resultsFurther work
System overview
Contents
System overview
Requirements analysis
Solution approachPreparationContent selectionCompositing
ConclusionsExperimental resultsFurther work
Requirements analysis
Priority requirements P.1. People and main characters
P.2. Fast understanding
P.3. Visual variability
Uniqueness requirements U.1. Non-repetition
U.2. Visual uniqueness
U.3. Characters uniqueness
Requirements analysis
Structural requirements S.1. Main characters highlight
S.2. Style
Navigability requirements N.1. Region boundaries
N.2. Metadata supplement
Requirements analysis
Contents
System overview
Requirements analysis
Solution approachPreparation
Uniform samplingShot boundary detection
Content selectionCompositing
ConclusionsExperimental resultsFurther work
Preparation (I)
Uniform sampling
fpsi = acquisition frame rate
N0 = number of samples
Li = video length (in frames)
Preparation (II)
Shot boundary detection
Customizable method for boundary detection
Default: Cumulative Pixel-to-Pixel
Contents
System overview
Requirements analysis
Solution approachPreparationContent selection
Face detectionFace clusteringObject detection
Compositing
ConclusionsExperimental resultsFurther work
Content selection (I)
Face detectionProblems:
Extreme size detections
Overlapping detections
Content selection (II)
Face detection
Content selection (III)
Face detection
Size filtering with fixed threshold
Content selection (IV)
Face detection
Overlap filtering
Frontal detections are more reliable.
Content selection (V)
Face detection
Content selection (VI)
Face clustering
Which faces belong to the same person?
Which faces appear more often in the video?
Unsupervised Face Clustering problem:
1. Unknown number of characters
2. Unknown ground truth
Solution:
Iterative cluster estimation using LBPH
Content selection (VII)
Face clustering
Pre-processing of face detection boxes
Content selection (VIII)
Face clustering
Iterative face labeling
Content selection (IX)
Face clustering
Content selection (X)
Face clustering
Content selection (XI)
Object detectionRelevant content is related to source video
Custom object map with:
1. Haar cascades
2. SURF descriptors matching
3. Deformable parts models
Content selection (XII)
Object detectionHaar cascade classifiers
Advantages:
- Quick object detection
- Training and detection stages included in OpenCV
Disadvantages:
- Fails at giving good results with different object views
- Slow training process
Content selection (XIII)
Object detectionSURF descriptors matching
Advantages:- No additional training stage needed- Scale and rotation invariant method- Real-time object detection- Descriptors extraction and matching strategy included in OpenCV
Disadvantages:- Very specific training image- Object may not be located in the image
Content selection (XIV)
Object detectionDeformable parts models
Advantages:
- Multiple object views detection
- Scored results
Disadvantages:
- Third party executable wrapped in Java
- Slow object detection process
Contents
System overview
Requirements analysis
Solution approachPreparationContent selectionCompositing
ConclusionsExperimental resultsFurther work
Compositing (I)
Object segmentation
Compositing (II)
Tile-based map
Adaptative map
Navigation functionalities
Contents
System overview
Requirements analysis
Solution approachPreparationContent selectionCompositing
ConclusionsExperimental resultsFurther work
Conclusions (I)
Experimental results
Web-based survey:
13 trailers
53 participants
Control methods:
Baseline: Uniform sampling
Upper bound: Manual frame selection
Conclusions (III)
Experimental resultsOverall rating
Recognition Rate
Attractiveness and effectiveness
Scores1 (Unacceptable), 2 (Fair), 3 (Good), 4 (Very good), 5 (Excellent)
Conclusions (II)
Experimental resultsOverall rating
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12 13
sco
re
trailer id
MOS for video
Uniform sampling
Object map
Manual selection
0
1
2
3
4
5
sco
re
MOS
Conclusions (III)
Experimental results
Trailer 1: The Intouchables
Uniform sampled Object map
Conclusions (III)
Experimental results
Trailer 7: The Fast and the Furious
Object map
Conclusions (IV)
Experimental resultsMovie recognition
a) Uniform sampling
b) Uniform sampling + Object map
c) Uniform sampling + Object map + Manual selection
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13
reco
gn
itio
n r
ate
(%
)
trailer id
Recognition Rate for video
a
b
c
0
20
40
60
80
100
reco
gn
itio
n r
ate
(%
)
Recognition rate
Conclusions (III)
Experimental results
Trailer 4: Dark Shadows
Trailer 9: Resident Evil 5 – Retribution
Uniform sampled Uniform sampled
Conclusions (V)
Experimental resultsAttractiveness and Effectiveness
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12 13
sco
re
trailer id
Acceptance rate
Attractiveness
Effectiveness
0
1
2
3
4
5
sco
re
Average acceptancerate
Conclusions (III)
Content-based video summarization application
Customizable
Allows to rapidly grasp video content
Generates a summary description file to include related metadata
ACM 2013 Open Source Software Competition
Code publicly available at Sourceforge
http://sourceforge.net/p/objectmaps
Conclusions (VI)
Further work
Face clustering improvement
Audio content analysis and understanding
Video sequence analysis
Content presentation analysis
Social Media