New Media, New Content Conference Keynote at the Hellenic Audiovisual Institute
Audiovisual content exploitation JTS2010
-
Upload
roelandordelmannl -
Category
Technology
-
view
676 -
download
0
description
Transcript of Audiovisual content exploitation JTS2010
![Page 1: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/1.jpg)
Audiovisual content exploitation in the networked information society
Roeland Ordelman
Research & Development
Netherlands Institute for Sound and Vision
Crowdsourcing Rock ‘n Roll
Multimedia Retrie
val
![Page 2: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/2.jpg)
contents
1. AV content exploitation, annotation technology and user needs– NISV context: digitization in Images
of the Future– Annotation technology for enabling
access– Annotation technology and user
needs
2. Example: Crowdsourcing Rock ‘n Roll Multimedia Retrieval
![Page 3: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/3.jpg)
NISV context
• +700.000 hours of radio, television, documentaries, films and music, over 2 million photographs, 20.000 objects like cameras, televisions, radios, costumes and pieces of scenery
• and growing:• digitally born television and radio programs
made by the Dutch public broadcasting companies (video: ~15K/hours/year)
• PROARCHIVE: archiving service• selection of (Dutch) AV content from the web
![Page 4: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/4.jpg)
IMAGES OF THE FUTURELARGE DIGITIZATION PROGRAM
![Page 5: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/5.jpg)
Images of the Future
• Selection, restoration, digitization, encoding and storage of 137,000 hours of video, 20,000 hours of film, 124,000 hours of audio and more than three million photographs.
• Three goals:• Safeguarding heritage for future generations• Creating social- economical value (“unlock the
social and economic potential of the collections”)
• Innovation: new infrastructure for strengthening knowledge economy
![Page 6: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/6.jpg)
INVESTMENTS
BUSINESS MODELS
The cultural heritage sector is challenged
to re-evaluate its business models
![Page 7: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/7.jpg)
Business model
• The total investment of this initiative sums up to 173 million Euro• A strong business model is necessary to support this kind of
investment and prove that such an investment will result in long-term socio-economic returns
• The outcome of a Cost-Benefit analysis was positive: “The total balance of costs and returns of restoring, preserving and digitising audio-visual material (excluding costs of tax payments) will be between: 20+ and 60+ million.’’
• Economic benefits:• Direct effects of the investment are revenues from sales,
access for specific user groups, the repartition of copyright for the use of the material and so on.
• The indirect effects concern the product markets and labour market.
• Social benefits:• conservation of culture, reinforcement of cultural awareness,
reinforcement of democracy through the accessibility of information, increase in multimedia literacy and contribution to the Lisbon goals set by the EU
http://www.prestoprime.org/project/public.en.html
![Page 8: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/8.jpg)
Content exploitation: from content is king ...
![Page 9: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/9.jpg)
... to metadata rules
![Page 10: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/10.jpg)
MANUAL ANNOTATION
costly & limited
![Page 11: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/11.jpg)
DECADE+ RESEARCH EFFORTS
(SEMI) AUTOMATIC ANNOTATION
![Page 12: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/12.jpg)
Research on automatic annotation
• automatic information extraction based on:• visual features• information from audio
• crowdsourcing• deploying collateral data sources:
• subtitles, production scripts, meeting minutes, slides
![Page 13: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/13.jpg)
PROGRESS? YES!
Various (laboratory) showcases
Commercial systems (e.g., blinkx, google)
![Page 14: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/14.jpg)
work in progress
• institutional: reorganisation of traditional archival workflows
• national: development of common services • OAI, Persistent Identifiers, ASR service,
Vocabulary Repositories• commercial: uptake by MNCs (Google and
Microsoft) and SMEs • individual: bring about a shift regarding
defensive attitude of content owners towards• opening up their funded and protected
archives • use of possibly noisy content
descriptions(trust/reliability)
![Page 15: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/15.jpg)
Automatic annotation: NISV as a user
• Participation in international research projects• Video Active, MultiMATCH, VIDI-video, LiWA, P2P-
Fusion, Sterna, EUscreen, PrestoPrime• Collaboration agreement with Dutch research
institutes• Researchers stationed at Sound and Vision• Provide data (TRECVID, VideoCLEF)
• Research environment: exact copy of iMMix production environment for testing new technology• speech recognition• video analysis• fingerprinting• linking of context data (web, program guide,
production data)
![Page 16: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/16.jpg)
DISPARITY BETWEEN TECHNOLOGY AND USER NEEDS
media professionals
journalists
researchers
educators
general public
![Page 17: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/17.jpg)
User perspective
• Rapidly evolving networked information society• Opening up• Focus on community specific
requirements• search needs• presentation/interaction needs
• Draw communities into libraries
![Page 18: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/18.jpg)
COMMUNITY SPECIFIC REQUIREMENTS
From document level search to fragment level search
![Page 19: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/19.jpg)
19
Broadcast professionals
In: Huurnink, Hollink, van Den Heuvel 2009 (submitted)
![Page 20: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/20.jpg)
User survey (broadcast professionals)
![Page 21: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/21.jpg)
Researchers
• Verteld Verleden aims at establishing a shared information space on distributed Dutch Oral History collections:• distributed collections (harvested via OAI)• search & interlink collections via centralized search
• project goals:
1. provide demonstrator portal to show how technology could help researchers
2. acquire information on specific user requirements • search• collaboration• linking• privacy• dedicated work space
http://www.verteldverleden.org
![Page 22: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/22.jpg)
DRAW COMMUNITIES INTO LIBRARIES
![Page 23: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/23.jpg)
Goals
• exploiting community tagging (tagging games, etc)
• exploring the wisdom of crowds by hooking up with user communities (e.g., everyone-as-commentator, unexpected experts)
• capturing relevant information from the internet and aligning this with archived items.
• finding new ways for communities to interact with the data.
![Page 24: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/24.jpg)
Technology perspective
Technology:• provide anchor points for linking up with the
`cloud’ (entity detection, segmentation, cross-collection SID, etc): people, places, events, topics, quotes, etc.
• synchronization of web-content/UGC with AV documents
• users in the loop: UGC for adapting/training analysis tools
• technology aided annotation: Documentalist Support System• provide documentalist/archivist with
relevant context during manual annotation
![Page 25: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/25.jpg)
WEB-ARCHIVINGCOLLECT CONTEXT DATA FROM THE WEB
![Page 26: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/26.jpg)
Web-archiving
• extend Sound and Vision archive with audiovisual content from the internet
• archive internet web content • preserve broadcast related
websites • to use as context information for
audiovisual data in the Sound and Vision archive
![Page 27: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/27.jpg)
AUDIOVISUAL INTERNET CONTENT
BROADCAST RELATED INTERNET CONTENT
iMMixAV ARCHIVE
WEB-ARCHIVE
CONTEXTCONTEXT
![Page 28: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/28.jpg)
Special Use Case: documentalist support
• in the process of generating metadata for an archived AV item, a documentalist searches for relevant information on this item, for example on the internet
• internet search might fail as such information is typically available only for a limited amount of time
• the “internet archive” works as a “contextdatabase” for relevant internet context
![Page 29: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/29.jpg)
TAGGING GAMEwww.waisda.nl
![Page 30: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/30.jpg)
CROWDSOURCING ROCK N’ ROLL MULTIMEDIA RETRIEVAL
Netherlands Institute for Sound and Vision
University of Amsterdam – Visual Search (Cees Snoek)
University of Twente – Speech Recognition (Franciska de Jong)
VideoDock – User Interface (Bauke Freiburg)
![Page 31: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/31.jpg)
Background
• 40th birthday of popular annual Dutch rock festival Pinkpop
• from only summary to almost unabridged recordings, even including raw, unpublished footage as well as interviews
• collection digitized in Images for the Future• goal: build an application for showcasing
history of the festival in an attractive way using state-of-the-art technology
![Page 32: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/32.jpg)
Rationale
• Use state-of-the-art visual analysis to allow browsing collection on the basis of visual concert concepts
• Use speech recognition for browsing interviews
• Exploit popularity of festival to get rock ‘n roll enthusiasts community into the loop:• general feedback on technology• improve and extend automatic labeling• share video fragment
![Page 33: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/33.jpg)
IPR
• Various Dutch broadcasters hold the copyrights of the content.
• Granted dispensation to use content to enable a large scale study of community-aided annotation and verification via an open internet platform• for a limited time period of three months,• video displayed in a secured player• (access to experimental results)
![Page 34: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/34.jpg)
Visual search
• visual concept detection: for each concept a ‘detector’ is trained on the basis of manually labeled training data.
• number of concepts in concerts more or less fixed (in contrast to BN domain), 12 were choosen based on:• frequency • visual detection feasibility• previous mentioning in literature• expected utility for users
• for each concept several hundred examples were labeled
![Page 35: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/35.jpg)
![Page 36: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/36.jpg)
![Page 37: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/37.jpg)
Fragment level concept detection
• video fragments instead of more technically defined shots or keyframes
• fragment algorithm finds the longest fragments with the highest average scores for a specific concert concept
• Only the top-n fragments per concert concept areloaded in the video player
![Page 38: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/38.jpg)
Speech Recognition
• Speech transcripts generated by open-source speech recognition toolkit SHoUT developed in MultimediaN and CATCH projects
• Words in transcripts have time-labels• Transcripts converted to filtered term
frequency list on the basis of tf.idf statistics for generating a time-synchronized term cloud:• jump to relevant interview parts via terms
![Page 39: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/39.jpg)
Player
• timeline-based videoplayer• colored dots represent concepts • clicking dot starts playback• feedback window:
• right/wrong label • comment• share (email/twitter)
• embed integrated video player,• including crowdsourcing mechanism
![Page 40: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/40.jpg)
![Page 41: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/41.jpg)
Encouraging User Feedback
• balance between appealing user experience and maximized user participation
• full-length concert videos (no ‘commercials’)• no interruptions, only graphical overlays• participation threshold kept low:
• no signing up• just click buttons (thumps-up/down)
• all user feedback with IP adresses and user sessions stored in database
![Page 42: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/42.jpg)
DEMO
![Page 43: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/43.jpg)
Preliminary results
• 12,563 visitors of which 9,595 unique in 3 months. • visitors watched on average 3.5 pages, with an
average viewing time of 4,57 minutes. • busiest day was December 3, with 1,566 visitors,
immediately after launch and media attention • Most traffic (65%) originated from 255 referrer
sites. The best referrer sites being: • pinkpop.nl (festival organization)• oor.nl (music magazine)• google
• users provided feedback more than 4000 times.• We are currently investigating how this
feedback can be exploited to improve automated multimedia analysis results
![Page 44: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/44.jpg)
Wrap up
• value of archive is strongly related to access opportunities
• access is to a large extend technology driven• but next to technology development we need
to make a shift:• from a ‘laboratory view’ on users to drawing
users and communities into the loop• NISV is aiming towards this two-way strategy:
• incorporate advanced access technology• discuss access requirements with the
stakeholders
![Page 45: Audiovisual content exploitation JTS2010](https://reader035.fdocuments.us/reader035/viewer/2022070304/54b9fbcf4a7959e9098b4621/html5/thumbnails/45.jpg)