Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data
description
Transcript of Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data
Beauty is Here! Evaluating Aesthetics in Videos UsingMultimodal Features and Free Training
Data
Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang
School of Computer Science, Fudan University, Shanghai, China
ACM MM, Barcelona, Catalunya, Spain, 2013
Overview
Task:Task:•Design a system to automatically identify aesthetically more appealing videos
Contribution:Contribution:• Propose to use free training data• Use and evaluate various kinds of features
ResultResult ::• Attain a Spearman‘s rank correlation coefficient of 0.41 on the NHK Dataset
• Construct two annotation-free training datasets by assuming images/videos on certain websites are mostly beautiful
Free Training Data
DPChallenge
images
Flickr
videos
Dutch
documentary
videos
+
+
-
• The first training set– Using images from DPChallenge as positive
samples, – and the Dutch documentary videos frames
as negative samples
• The second training set– Using videos from Flickr as positive
samples,– and the Dutch documentary videos as
negative samples
Free Training Data
Multimodal Features
TraditionTraditional Visualal Visual FeaturesFeatures
Mid-level Mid-level Semantic Semantic AttributeAttribute
ss
Style Style DescriptoDescripto
rr
Video Video Motion Motion FeatureFeature
ColorLBP SIFTHOG
Classemes[ECCV’10]
Dense Trajectory[CVPR’11]
Framework
Image Low-Level Features
(Color, LBP, SIFT, HOG)
Mid-Level Semantic Attributes
(Classemes)
Video Motion Feature(Dense
Trajectory)
SVM Models (Image Trainin
g Data)
…
Style DescriptorSVM
Models (Video Trainin
g Data)
Feature Extractio
n
Classifiers
Ranking List
Input Videos
• Using training data from Flickr & Dutch Documentary videos• Evaluated on a subset labeled by ourselves
Result
The best single feature
Sp
earm
an
's r
an
k c
orr
ela
tion Dense Trajectory which is very Dense Trajectory which is very
powerful in human action powerful in human action recognition, performs poorly,recognition, performs poorly, indicatingindicating thatthat motionmotion isis lessless relatedrelated toto beautybeauty
The best result
• Using training data from DPChallenge & Dutch Documentary images/frames• Evaluated on a subset labeled by ourselves
Result
0.410.43
Image-based training is more suitable on NHK dataset,
because most NHK videos focus on scenes.
The best single feature
Sp
earm
an
's r
an
k c
orr
ela
tion
• Official evaluation results from NHK, on the entire test set• We submitted 5 runs• Evaluated on NHK’s official labels, which are not publicly
available
• Observations• Image training data is more effective, similar to observations
on the small subset• Color and Classemes are complementary, SIFT is not• NOTE: These submitted runs were selected before annotating
the subset, which was done later to provide more insights in the paper!
Result
Demo
A collection of clips from the top 10 videos identified by our system
Thank you!