MediaEval 2016 - UNIFESP Predicting Media Interestingness Task
2016 MediaEval - Interestingness Task Overview
-
Upload
multimediaeval -
Category
Science
-
view
94 -
download
1
Transcript of 2016 MediaEval - Interestingness Task Overview
Predicting Media Interestingness Task Overview
Claire-Hélène Demarty – TechnicolorMats Sjöberg – University of Helsinki
Bogdan Ionescu – University Polytehnica of BucharestThanh-Toan Do – Singapor University of Science
Hanli Wang – Tongji UniversityNgoc Q.K. Duong, TechnicolorFrédéric Lefebvre, Technicolor
MediaEval 2016 WorkshopOctober, 20-21st 2016
Interestingness?
Are these interesting images?
2
?
Interestingness?
Are these interesting images?
3
?
Definition?
Interestingness?
Are these interesting images?
4
?
Definition?
Subjective
SemanticPerceptual
n Derives from a use case at Technicolorn Helping professionals to illustrate a Video on Demand (VOD) web site by
selecting some interesting frames and/or video excerpts for the posted movies.
n The frames and excerpts should be suitable in terms of helping a user to make his/her decision about whether he/she is interested in watching the underlying movie.
n Two subtasks -> Image and Videon Image subtask: given a set of keyframes extracted from a movie, …
n Video subtask: given the video shots of a movie, …
… automatically identify those images/shots that viewers report to be the most interesting in the given movie.
n Binary classification task on a per movie basis…
… but confidence values are also required.
5
Task definition
12/7/16
n From Hollywood-like movie trailersn Manual segmentation of shots
n Extraction of middle key-frame of each shot
6
Dataset & additional features
12/7/16
Development Set Test Set
Trailer # 52 26Total % interesting Total % interesting
Shot # 5054 8.3 2342 9.6
Key-frame # 5054 9.4 2342 10.3
n Precomputed content descriptors:n Low-level: denseSift, HoG, LBP, GIST, HSV color histograms, MFCC, fc7 and
prob layers from AlexNet
n Mid-level: face detection and tracking-by-detection
7
Manual annotations
12/7/16
Thank youMats!Thanks to allof you!
Binary decision(manual
thresholding)
Pair comparisonprotocol
Aggregation intorankings
pairs rankings
Annotators:>310 persons for video>100 persons for imageFrom 29 countries
Image subtask: Visual information only, no external data
Video subtask: Audio and visual information, no external data
External data IS:n Additional datasets and annotations dedicated to the interestingness
prediction
n Pre-trained models, features, detectors obtained from such dedicateddatasets
n Additional metadata that could be found on the internet on the providedcontent
External data IS NOT:n CNN features generated on generic datasets not dedicated to interestingness
prediction
8
Required runs
12/7/16
n Official measure:
Ø Mean Average Precision (over all trailers)
n Additional metrics are computed:n False alarm rate, miss detection rate, precision, recall, F-measure, etc.
9
Evaluation metrics
12/7/16
10
Task participation
12/7/16
0
5
10
15
20
25
30
35
Registrations Returned agreements Submitting teams Workshop
Task Participation
n Registrations:n 31 teams
n 16 countries
n 3 ‘experienced’ teams
n Submissions: 12 teamsn 9 teams submitted on both substasks
n 2 teams on image subtask
n 1 team on video subtask
11
Official results – Image subtask – 27 runs
12/7/16
Runs MAP Official rankingme16in_tudmmc2_image_histface 0.2336 TUDMMCme16in_technicolor_image_run1_SVM_rbf* 0.2336 Technicolorme16in_technicolor_image_run2_DNNresampling06_100* 0.2315 Technicolorme16in_MLPBOON_image_run5 0.2296 MLPBOONme16in_BigVid_image_run5FusionCNN 0.2294 BigVidme16in_MLPBOON_image_run1 0.2205 MLPBOONme16in_tudmmc2_image_hist 0.2202 TUDMMCme16in_MLPBOON_image_run4 0.217 MLPBOONme16in_HUCVL_image_run1 0.2125 HUCVLme16in_HUCVL_image_run2 0.2121 HUCVLme16in_UITNII_image_FA 0.2115 UITNIIme16in_RUC_image_run2 0.2035 RUCme16in_MLPBOON_image_run2 0.2023 MLPBOONme16in_HUCVL_image_run3 0.2001 HUCVLme16in_RUC_image_run3 0.1991 RUCme16in_RUC_image_run1 0.1987 RUCme16in_ethcvl1_image_run2 0.1952 ETHCVLme16in_MLPBOON_image_run3 0.1941 MLPBOONme16in_HKBU_image_baseline 0.1868 HKBUme16in_ethcvl1_image_run1 0.1866 ETHCVLme16in_ethcvl1_image_run3 0.1858 ETHCVLme16in_HKBU_image_drbaseline 0.1839 HKBUme16in_BigVid_image_run4SVM 0.1789 BigVidme16in_UITNII_image_V1 0.1773 UITNIIme16in_lapi_image_runf1* 0.1714 LAPIme16in_UNIGECISA_image_ReglineLoF 0.1704 UNIGECISABASELINE (on testset) 0.1655me16in_lapi_image_runf2* 0.1398 LAPI
* organizers
12
Official results – Video subtask – 28 runs
12/7/16
* organizers
Runs MAP Official rankingme16in_recod_video_run1 0.1815 RECODme16in_recod_video_run1_old 0.1753 RECODme16in_HKBU_video_drbaseline 0.1735 HKBUme16in_UNIGECISA_video_RegsrrLoF 0.171 UNIGECISAme16in_RUC_video_run2 0.1704 RUCme16in_UITNII_video_A1 0.169 UITNIIme16in_recod_video_run4 0.1656 RECODme16in_RUC_video_run1 0.1647 RUCme16in_UITNII_video_F1 0.1641 UITNIIme16in_lapi_video_runf5 0.1629 LAPIme16in_technicolor_video_run5_CSP_multimodal_80_epoch7 0.1618 Technicolorme16in_recod_video_run2 0.1617 RECODme16in_recod_video_run3 0.1617 RECODme16in_ethcvl1_video_run2 0.1574 ETHCVLme16in_lapi_video_runf3 0.1574 LAPIme16in_lapi_video_runf4 0.1572 LAPIme16in_tudmmc2_video_histface 0.1558 TUDMMCme16in_tudmmc2_video_hist 0.1557 TUDMMCme16in_BigVid_video_run3RankSVM 0.154 BigVidme16in_HKBU_video_baseline 0.1521 HKBUme16in_BigVid_video_run2FusionCNN 0.1511 BigVidme16in_UNIGECISA_video_RegsrrGiFe 0.1497 UNIGECISABASELINE (on testset) 0.1496me16in_BigVid_video_run1SVM 0.1482 BigVIdme16in_technicolor_video_run3_LSTM_U19_100_epoch5 0.1465 Technicolorme16in_recod_video_run5 0.1435 RECODme16in_UNIGECISA_video_SVRloAudio 0.1367 UNIGECISAme16in_technicolor_video_run4_CSP_video_80_epoch9 0.1365 Technicolorme16in_ethcvl1_video_run1 0.1362 ETHCVL
n On the task itself?n Image interestingness is NOT video interestingness
n Issue with the video dataset (needs more interations? needs more data samples?)
n Overall low map values: room for improvment!
n On the participants systems?n This year’s trend? No trend!
n Classic machine learning, deep learning systems… but also rule-based systems
n Some multimodal (audio, video, text), some temporal… and some not.
n (Mostly) No use of external data
n Simple systems did as well (better…) than sophisticated systems
n Dataset unbalance: an issue?
n Dataset size: penalizing deep learning systems?
13
What we have learned
12/7/16
14 12/7/16
Thank you!