Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data

Beauty is Here! Evaluating Aesthetics in Videos UsingMultimodal Features and Free Training

Data

Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang

School of Computer Science, Fudan University, Shanghai, China

ACM MM, Barcelona, Catalunya, Spain, 2013

Overview

Task:Task:•Design a system to automatically identify aesthetically more appealing videos

Contribution:Contribution:• Propose to use free training data• Use and evaluate various kinds of features

ResultResult ：：• Attain a Spearman‘s rank correlation coefficient of 0.41 on the NHK Dataset

• Construct two annotation-free training datasets by assuming images/videos on certain websites are mostly beautiful

Free Training Data

DPChallenge

images

Flickr

videos

Dutch

documentary

videos

+

+

-

• The first training set– Using images from DPChallenge as positive

samples, – and the Dutch documentary videos frames

as negative samples

• The second training set– Using videos from Flickr as positive

samples,– and the Dutch documentary videos as

negative samples

Free Training Data

Multimodal Features

TraditionTraditional Visualal Visual FeaturesFeatures

Mid-level Mid-level Semantic Semantic AttributeAttribute

ss

Style Style DescriptoDescripto

rr

Video Video Motion Motion FeatureFeature

ColorLBP SIFTHOG

Classemes[ECCV’10]

Dense Trajectory[CVPR’11]

Framework

Image Low-Level Features

(Color, LBP, SIFT, HOG)

Mid-Level Semantic Attributes

(Classemes)

Video Motion Feature(Dense

Trajectory)

SVM Models (Image Trainin

g Data)

…

Style DescriptorSVM

Models (Video Trainin

g Data)

Feature Extractio

n

Classifiers

Ranking List

Input Videos

• Using training data from Flickr & Dutch Documentary videos• Evaluated on a subset labeled by ourselves

Result

The best single feature

Sp

earm

an

's r

an

k c

orr

ela

tion Dense Trajectory which is very Dense Trajectory which is very

powerful in human action powerful in human action recognition, performs poorly,recognition, performs poorly, indicatingindicating thatthat motionmotion isis lessless relatedrelated toto beautybeauty

The best result

• Using training data from DPChallenge & Dutch Documentary images/frames• Evaluated on a subset labeled by ourselves

Result

0.410.43

Image-based training is more suitable on NHK dataset,

because most NHK videos focus on scenes.

The best single feature

Sp

earm

an

's r

an

k c

orr

ela

tion

• Official evaluation results from NHK, on the entire test set• We submitted 5 runs• Evaluated on NHK’s official labels, which are not publicly

available

• Observations• Image training data is more effective, similar to observations

on the small subset• Color and Classemes are complementary, SIFT is not• NOTE: These submitted runs were selected before annotating

the subset, which was done later to provide more insights in the paper!

Result

Demo

A collection of clips from the top 10 videos identified by our system

Thank you!

Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data

Documents

Transcript of Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data