Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
-
Upload
maranlar -
Category
Technology
-
view
89 -
download
4
Transcript of Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
![Page 1: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/1.jpg)
Multimedia Information Retrieval:
Bytes and pixels meet the challenges of human media interpretation
Martha LarsonDelft University of Technology and Radboud University Nijmegen29 June 2016, Communication Science, Radboud University Nijmegen
![Page 2: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/2.jpg)
About me
● Where do I work?○ TU Delft: Multimedia Computing Group○ Radboud University: Multimedia Information Technology
● What do I do?○ Background: Speech and language,○ Research: Multimedia retrieval and recommender systems,○ Emphasis: How people interpret and use multimedia.
● What am I doing today?○ Sharing with you potential and open issues.
![Page 3: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/3.jpg)
Today’s topics
● Introducing intelligent information systems○ Multimedia information retrieval (user is active)○ Recommender systems (user is passive)
● Computer Science and Multimedia○ The “love” relationship: lots of data○ The “hate” relationship: people’s interpretation of media
is not “neat”!● How to move forward?
○ Benchmarking challenges
![Page 4: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/4.jpg)
Intelligent Information Systems
● Connect users with information,● Information: digital content, facts, products, services,● Include search engines and recommender systems,● Success is judged by satisfaction of user needs.
![Page 5: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/5.jpg)
![Page 6: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/6.jpg)
Information retrieval
Definition: Information retrieval (IR) is finding material of an unstructured nature that satisfies an information need from within large collections. http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html
![Page 7: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/7.jpg)
![Page 8: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/8.jpg)
Recommender Systems
Definition: A recommender system tries to identify sets of items that are likely to be of interest to a certain user given some information from that user’s profile.
![Page 9: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/9.jpg)
“Multimedia Clues” for the computer scientist● Text: Things people write about images and videos.● User interactions: What people click on, how long they
watch.● Pixel statistics: Colors, lines, textures, shot change
patterns.● Concept detection: Entities that can be detected in
images and videos (faces can be detected well).● Speech recognition: What is said in a video.● Sound detection: Sounds that can be detected (laughter
and gunshots can be detected well).
![Page 10: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/10.jpg)
![Page 11: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/11.jpg)
![Page 12: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/12.jpg)
![Page 13: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/13.jpg)
Visual Geo-location prediction
● Combine evidence from multiple images (e) taken in an area (Eg).
● Upweight elements that are distinctive for that particular area (WGeo).
Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf
![Page 14: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/14.jpg)
![Page 15: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/15.jpg)
Good match: Lots of what’s unique
![Page 16: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/16.jpg)
Visual Geo-location prediction
Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf
![Page 17: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/17.jpg)
![Page 18: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/18.jpg)
Conventional search engine finds “what”
Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.
I want a song called “koi pond”.I’m interested in garden koi ponds.
![Page 19: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/19.jpg)
Intent-aware search responds to “why”
Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.
I am interested in the significance of koi ponds.
I want to build a koi pond.
![Page 20: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/20.jpg)
User intent in video search
Our study identified five major reasons why people search for videos online:
● Information (declarative knowledge)● Experience for Learning (performative knowledge)● Experience for Exposure (“being there”)● Affect (change of mood)● Object (video as video)
Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.
![Page 21: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/21.jpg)
Why are video moments important?
R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere to Support Non-Linear Video Access," in IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1372-1384, Aug. 2015.
![Page 22: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/22.jpg)
Viewer Expressive Reactions
R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere to Support Non-Linear Video Access," in IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1372-1384, Aug. 2015.
Expressive reactions are not emotional in the classic sense.
They are also not completely personal...but..
![Page 23: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/23.jpg)
The way people take a picture reflects what they are taking a picture of.
Pixel statistics reveal very simple information on how people take pictures.
We need people to judge if the computer guesses right.
Michael Riegler, Martha Larson, Mathias Lux, and Christoph Kofler. 2014. How 'How' Reflects What's What: Content-based Exploitation of How Users Frame Social Images. In Proceedings of the 22nd ACM international conference on Multimedia (MM '14).
Fashion and framing
![Page 24: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/24.jpg)
Characterize the trend...
![Page 25: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/25.jpg)
Jacket types are already very difficult for computers!
![Page 26: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/26.jpg)
Crowdsourcing
People interpret images in exchange for micropayments.
Example: Amazon Mechanical Turk
![Page 27: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/27.jpg)
MediaEval 2016Multimedia Benchmark Initiative
moving forward with benchmarking
![Page 28: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/28.jpg)
MediaEval Multimedia Evaluation Benchmark
● offers tasks on multimedia access and retrieval,● exploits features derived from multiple modalities:
speech, audio, visual content, tags, users, context, ● solutions may or may not involve machine learning.
multimediaeval.org
This year: MediaEval workshop is right after ACM Multimedia 2016
in Amsterdam
![Page 29: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/29.jpg)
Example MediaEval Tasks● Predicting Media Interestingness: Infer interesting
frames and segments of movies (using audio, visual features, text).
● Retrieving Diverse Social Images: Diversify image results lists (text, visual features).
● Context of Multimedia Experience: Predict multimedia content suitable for watching in stressful situations.
● Person Discovery: finding people in broadcast content.● Placing: geo-location estimation for social multimedia.
multimediaeval.org
![Page 30: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/30.jpg)
Publications arising from MediaEvalhttp://www.citeulike.org/group/16499
![Page 31: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/31.jpg)
2015 Workshop Participants80 participants from 25 countries
multimediaeval.org
![Page 32: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/32.jpg)
MediaEval Proceedings Papers
multimediaeval.org
![Page 33: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/33.jpg)
What sets MediaEval apart?
• … emphasizes the "multi" in multimedia: speech, audio, visual content, tags, users, context.
• … innovates new tasks and techniques focusing on the human and social aspects of multimedia content.
• … community driven.
multimediaeval.org
![Page 34: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/34.jpg)
Predicting Media Interestingness Task
Automatically select frames or portions of movies which are the most interesting for a common viewer.
● Goal: Make use of the visual, audio and text content (features provided).
● Data: consists in ca 100 movie trailers, together with human annotations
● Metric: System performance is to be evaluated using standard Mean Average Precision.
![Page 35: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/35.jpg)
Predicting Media Interestingness Task
http://multimediaeval.org
![Page 36: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/36.jpg)
Retrieving Diverse Social Images Task
This task addresses the problem of image search result diversification in the context of social media:
● Goal: refine a ranked list of Flickr photos retrieved with general purpose multi-topic queries using provided visual, textual and user tagging credibility information.
● Metrics: results are evaluated with respect to their relevance to the query and the diverse representation of it.
● Data: ~40k images, social metadata, text models, CNN descriptors, user tagging credibility dataset, etc
Three data sets have been published at the MMSys dataset track.
![Page 37: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/37.jpg)
Retrieving Diverse Social Images Task (cont.)
initial retrieval results
diversified results
Initial results
Diversified results
![Page 38: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/38.jpg)
Context of Multimedia Experience Task
Develops multimodal techniques for automatic prediction of multimedia in a particular consumption content.
● Goal: Predict movies that are suitable to watch on airplanes.
● Data: Input to the prediction methods is movie trailers, and metadata from IMDb, Rotten Tomatoes and Metacritic.
● Metric: Output is evaluated using the Weighted F1 score, with expert labels as ground truth.
This year: Task is offered at the MediaEval workshop and at a joint-challenge workshop at http://www.icpr2016.org
![Page 39: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/39.jpg)
Context of Multimedia Experience TaskDifferent context can lead to different preferences...
...people like to watch different movies than they would at home or in the cinema.
![Page 40: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/40.jpg)
Multimodal Person Discovery in Broadcast TV Task
● Goal: Given raw TV broadcasts, each shot must be automatically tagged with the name(s) of people who can be both seen as well as heard in the shot.
● The list of people is not known a priori and their names must be discovered in an unsupervised way from provided text overlay or speech transcripts.
● Data: Multilingual corpus from INA (French), DW (German & English) and UPC (Catalan)
● Metric: standard information retrieval metrics based on a posteriori collaborative annotation of the corpus by the participants themselves.
![Page 41: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/41.jpg)
Person Discovery Task
Person names must be discovered in speech track and/or sub-titles. Models cannot be trained on external data.
Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organizers MediaEval 2015
![Page 42: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/42.jpg)
Tackling the Person Discovery Task
Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organizers MediaEval 2015
![Page 43: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/43.jpg)
Wrap Up
● We want to connect users with information,in order to satisfy information needs.
● CS Love: Lots of data!● CS Hate: How do people really see multimedia, what do
they want?● Way forward: Continue to define new challenges and build
algorithms to address them.
![Page 44: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/44.jpg)
Beyond the user-item matrix
CrowdRec project
● Exploiting multiple sources of information,● Leveraging the Crowd (crowdworkers, users, curators),● Evaluating large scale.
Context-driven Recommender systems:
“People have more in common with other people in the same
situation than they do with past versions of themselves”
Roberto Pagano, Paolo Cremonesi, Martha Larson, Balazs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, and Massimo Quadrana The Contextual Turn: from Context-aware to Context-driven recommender systems. ACM RecSys 2016, to appear.
![Page 45: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/45.jpg)
Turn from personalization• Context has been taken into account by coupling it with personalization, with context-aware recommender systems
• However being aware of the context is not enough for some domains: recommendations should be driven by the context
In traditional recsys, Immutable Preference paradigm (ImP):
• User tastes do not evolve
• Goals and needs are static
• Item catalog is static
• Trendiness, Seasonality, Capacity and life-cycle addresses by tweaks to existing models
Slide credit: Roberto Pagano
Slide credit: Roberto Pagano
![Page 46: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/46.jpg)
MusicI usually like heavy metal music, but now I have to work and I want to listen to some
soft music
Recommended for you:
Slide credit: Roberto Pagano
![Page 47: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/47.jpg)
Jaeyoung Choi, Eungchan Kim, Martha Larson, Gerald Friedland, and Alan Hanjalic. 2015. Evento 360: Social Event Discovery from Web-scale Multimedia Collection. ACM Multimedia 2015, pp. 193-196.
![Page 48: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/48.jpg)
Thank youMohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, Gareth Jones, Claire-Helene Demarty, Ngoc Duong, Frédéric Lefebvre, Yu-Gang Jiang, Bogdan Ionescu, Mats Sjöberg, Hanli Wang,, Toan Do, Richard Sutcliffe, Chris Fox, Richard Lewis, Tom Collins, Eduard Hovy, Deane L. Root, Igor Szoke, Xavier Anguera, Claude Barras, Hervé Bredin, Camille Guinaudeau, Jean Carrive, Yannick Estève, Javier Hernando, Juliette Kahn, Nam Le, Sylvain Meignier , Ramon Morros, Johann Poignant, Satoshi Tamura, Bart Thomee, Olivier Van Laere, Claudia Hauff , Jaeyoung Choi, Emmanuel Dellandréa, Liming Chen, Yoann Baveye, Mats Sjöberg, Christina Boididou, Symeon Papadopoulos, Stuart E. Middleton, Michael Riegler, Duc Tien, Dang Nguyen, Giulia Boato, Andreas Petlund, Michael Riegler, Concetto Spampinato, Bogdan Ionescu, Alexandru Lucian Gînscă, Maia Zaharieva, Mihai Lupu, Henning Müller, Adrian Popescu, Bogdan Boteanu, Alan Woodley, Shlomo Geva, Timothy Chappell, Richi Nayak, Gabi Constantin, Roberto Pagano, Paolo Cremonesi, Martha Larson, Balazs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, Massimo Quadrana, Xinchao Li, Alan Hanjalic, Andreas Lommatzsch, Benjamin Kille, Fabian Abel, Daniel Kohlsdorf, Jonas Seiler, Róbert Pálovics, Andras Benczur...
![Page 49: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation](https://reader031.fdocuments.us/reader031/viewer/2022030121/58a29bb81a28ab36508b78cd/html5/thumbnails/49.jpg)
Links
● Challenges (Benchmarks)○ MediaEval Multimedia Evaluation
(http://multimediaeval.org),○ CLEF NewsREEL News Recommendation challenge
(http://www.clef-newsreel.org),○ ACM RecSys 2016 Job Recommendation challenge
(http://2016.recsyschallenge.com).● Acknowledgements
○ Multimedia Commons (http://www.multimediacommons.org),○ EC-funded CrowdRec project (http://crowdrec.eu).