Audiovisual content exploitation JTS2010

45
Audiovisual content exploitation in the networked information society Roeland Ordelman Research & Development Netherlands Institute for Sound and Vision r[email protected] Crowdsou rcing Ro ck ‘n Roll Mul timedia Retrieva l

description

On audiovisual content exploitation at Netherlands Institute for Sound and Vision and the crowdsourcing application Pinkpop

Transcript of Audiovisual content exploitation JTS2010

Page 1: Audiovisual content exploitation  JTS2010

Audiovisual content exploitation in the networked information society

Roeland Ordelman

Research & Development

Netherlands Institute for Sound and Vision

[email protected]

Crowdsourcing Rock ‘n Roll

Multimedia Retrie

val

Page 2: Audiovisual content exploitation  JTS2010

contents

1. AV content exploitation, annotation technology and user needs– NISV context: digitization in Images

of the Future– Annotation technology for enabling

access– Annotation technology and user

needs

2. Example: Crowdsourcing Rock ‘n Roll Multimedia Retrieval

Page 3: Audiovisual content exploitation  JTS2010

NISV context

• +700.000 hours of radio, television, documentaries, films and music, over 2 million photographs, 20.000 objects like cameras, televisions, radios, costumes and pieces of scenery

• and growing:• digitally born television and radio programs

made by the Dutch public broadcasting companies (video: ~15K/hours/year)

• PROARCHIVE: archiving service• selection of (Dutch) AV content from the web

Page 4: Audiovisual content exploitation  JTS2010

IMAGES OF THE FUTURELARGE DIGITIZATION PROGRAM

Page 5: Audiovisual content exploitation  JTS2010

Images of the Future

• Selection, restoration, digitization, encoding and storage of 137,000 hours of video, 20,000 hours of film, 124,000 hours of audio and more than three million photographs.

• Three goals:• Safeguarding heritage for future generations• Creating social- economical value (“unlock the

social and economic potential of the collections”)

• Innovation: new infrastructure for strengthening knowledge economy

Page 6: Audiovisual content exploitation  JTS2010

INVESTMENTS

BUSINESS MODELS

The cultural heritage sector is challenged

to re-evaluate its business models

Page 7: Audiovisual content exploitation  JTS2010

Business model

• The total investment of this initiative sums up to 173 million Euro• A strong business model is necessary to support this kind of

investment and prove that such an investment will result in long-term socio-economic returns

• The outcome of a Cost-Benefit analysis was positive: “The total balance of costs and returns of restoring, preserving and digitising audio-visual material (excluding costs of tax payments) will be between: 20+ and 60+ million.’’

• Economic benefits:• Direct effects of the investment are revenues from sales,

access for specific user groups, the repartition of copyright for the use of the material and so on.

• The indirect effects concern the product markets and labour market.

• Social benefits:• conservation of culture, reinforcement of cultural awareness,

reinforcement of democracy through the accessibility of information, increase in multimedia literacy and contribution to the Lisbon goals set by the EU

http://www.prestoprime.org/project/public.en.html

Page 8: Audiovisual content exploitation  JTS2010

Content exploitation: from content is king ...

Page 9: Audiovisual content exploitation  JTS2010

... to metadata rules

Page 10: Audiovisual content exploitation  JTS2010

MANUAL ANNOTATION

costly & limited

Page 11: Audiovisual content exploitation  JTS2010

DECADE+ RESEARCH EFFORTS

(SEMI) AUTOMATIC ANNOTATION

Page 12: Audiovisual content exploitation  JTS2010

Research on automatic annotation

• automatic information extraction based on:• visual features• information from audio

• crowdsourcing• deploying collateral data sources:

• subtitles, production scripts, meeting minutes, slides

Page 13: Audiovisual content exploitation  JTS2010

PROGRESS? YES!

Various (laboratory) showcases

Commercial systems (e.g., blinkx, google)

Page 14: Audiovisual content exploitation  JTS2010

work in progress

• institutional: reorganisation of traditional archival workflows

• national: development of common services • OAI, Persistent Identifiers, ASR service,

Vocabulary Repositories• commercial: uptake by MNCs (Google and

Microsoft) and SMEs • individual: bring about a shift regarding

defensive attitude of content owners towards• opening up their funded and protected

archives • use of possibly noisy content

descriptions(trust/reliability)

Page 15: Audiovisual content exploitation  JTS2010

Automatic annotation: NISV as a user

• Participation in international research projects• Video Active, MultiMATCH, VIDI-video, LiWA, P2P-

Fusion, Sterna, EUscreen, PrestoPrime• Collaboration agreement with Dutch research

institutes• Researchers stationed at Sound and Vision• Provide data (TRECVID, VideoCLEF)

• Research environment: exact copy of iMMix production environment for testing new technology• speech recognition• video analysis• fingerprinting• linking of context data (web, program guide,

production data)

Page 16: Audiovisual content exploitation  JTS2010

DISPARITY BETWEEN TECHNOLOGY AND USER NEEDS

media professionals

journalists

researchers

educators

general public

Page 17: Audiovisual content exploitation  JTS2010

User perspective

• Rapidly evolving networked information society• Opening up• Focus on community specific

requirements• search needs• presentation/interaction needs

• Draw communities into libraries

Page 18: Audiovisual content exploitation  JTS2010

COMMUNITY SPECIFIC REQUIREMENTS

From document level search to fragment level search

Page 19: Audiovisual content exploitation  JTS2010

19

Broadcast professionals

In: Huurnink, Hollink, van Den Heuvel 2009 (submitted)

Page 20: Audiovisual content exploitation  JTS2010

User survey (broadcast professionals)

Page 21: Audiovisual content exploitation  JTS2010

Researchers

• Verteld Verleden aims at establishing a shared information space on distributed Dutch Oral History collections:• distributed collections (harvested via OAI)• search & interlink collections via centralized search

• project goals:

1. provide demonstrator portal to show how technology could help researchers

2. acquire information on specific user requirements • search• collaboration• linking• privacy• dedicated work space

http://www.verteldverleden.org

Page 22: Audiovisual content exploitation  JTS2010

DRAW COMMUNITIES INTO LIBRARIES

Page 23: Audiovisual content exploitation  JTS2010

Goals

• exploiting community tagging (tagging games, etc)

• exploring the wisdom of crowds by hooking up with user communities (e.g., everyone-as-commentator, unexpected experts)

• capturing relevant information from the internet and aligning this with archived items.

• finding new ways for communities to interact with the data.

Page 24: Audiovisual content exploitation  JTS2010

Technology perspective

Technology:• provide anchor points for linking up with the

`cloud’ (entity detection, segmentation, cross-collection SID, etc): people, places, events, topics, quotes, etc.

• synchronization of web-content/UGC with AV documents

• users in the loop: UGC for adapting/training analysis tools

• technology aided annotation: Documentalist Support System• provide documentalist/archivist with

relevant context during manual annotation

Page 25: Audiovisual content exploitation  JTS2010

WEB-ARCHIVINGCOLLECT CONTEXT DATA FROM THE WEB

Page 26: Audiovisual content exploitation  JTS2010

Web-archiving

• extend Sound and Vision archive with audiovisual content from the internet

• archive internet web content • preserve broadcast related

websites • to use as context information for

audiovisual data in the Sound and Vision archive

Page 27: Audiovisual content exploitation  JTS2010

AUDIOVISUAL INTERNET CONTENT

BROADCAST RELATED INTERNET CONTENT

iMMixAV ARCHIVE

WEB-ARCHIVE

CONTEXTCONTEXT

Page 28: Audiovisual content exploitation  JTS2010

Special Use Case: documentalist support

• in the process of generating metadata for an archived AV item, a documentalist searches for relevant information on this item, for example on the internet

• internet search might fail as such information is typically available only for a limited amount of time

• the “internet archive” works as a “contextdatabase” for relevant internet context

Page 29: Audiovisual content exploitation  JTS2010

TAGGING GAMEwww.waisda.nl

Page 30: Audiovisual content exploitation  JTS2010

CROWDSOURCING ROCK N’ ROLL MULTIMEDIA RETRIEVAL

Netherlands Institute for Sound and Vision

University of Amsterdam – Visual Search (Cees Snoek)

University of Twente – Speech Recognition (Franciska de Jong)

VideoDock – User Interface (Bauke Freiburg)

Page 31: Audiovisual content exploitation  JTS2010

Background

• 40th birthday of popular annual Dutch rock festival Pinkpop

• from only summary to almost unabridged recordings, even including raw, unpublished footage as well as interviews

• collection digitized in Images for the Future• goal: build an application for showcasing

history of the festival in an attractive way using state-of-the-art technology

Page 32: Audiovisual content exploitation  JTS2010

Rationale

• Use state-of-the-art visual analysis to allow browsing collection on the basis of visual concert concepts

• Use speech recognition for browsing interviews

• Exploit popularity of festival to get rock ‘n roll enthusiasts community into the loop:• general feedback on technology• improve and extend automatic labeling• share video fragment

Page 33: Audiovisual content exploitation  JTS2010

IPR

• Various Dutch broadcasters hold the copyrights of the content.

• Granted dispensation to use content to enable a large scale study of community-aided annotation and verification via an open internet platform• for a limited time period of three months,• video displayed in a secured player• (access to experimental results)

Page 34: Audiovisual content exploitation  JTS2010

Visual search

• visual concept detection: for each concept a ‘detector’ is trained on the basis of manually labeled training data.

• number of concepts in concerts more or less fixed (in contrast to BN domain), 12 were choosen based on:• frequency • visual detection feasibility• previous mentioning in literature• expected utility for users

• for each concept several hundred examples were labeled

Page 35: Audiovisual content exploitation  JTS2010
Page 36: Audiovisual content exploitation  JTS2010
Page 37: Audiovisual content exploitation  JTS2010

Fragment level concept detection

• video fragments instead of more technically defined shots or keyframes

• fragment algorithm finds the longest fragments with the highest average scores for a specific concert concept

• Only the top-n fragments per concert concept areloaded in the video player

Page 38: Audiovisual content exploitation  JTS2010

Speech Recognition

• Speech transcripts generated by open-source speech recognition toolkit SHoUT developed in MultimediaN and CATCH projects

• Words in transcripts have time-labels• Transcripts converted to filtered term

frequency list on the basis of tf.idf statistics for generating a time-synchronized term cloud:• jump to relevant interview parts via terms

Page 39: Audiovisual content exploitation  JTS2010

Player

• timeline-based videoplayer• colored dots represent concepts • clicking dot starts playback• feedback window:

• right/wrong label • comment• share (email/twitter)

• embed integrated video player,• including crowdsourcing mechanism

Page 40: Audiovisual content exploitation  JTS2010
Page 41: Audiovisual content exploitation  JTS2010

Encouraging User Feedback

• balance between appealing user experience and maximized user participation

• full-length concert videos (no ‘commercials’)• no interruptions, only graphical overlays• participation threshold kept low:

• no signing up• just click buttons (thumps-up/down)

• all user feedback with IP adresses and user sessions stored in database

Page 42: Audiovisual content exploitation  JTS2010

DEMO

Page 43: Audiovisual content exploitation  JTS2010

Preliminary results

• 12,563 visitors of which 9,595 unique in 3 months. • visitors watched on average 3.5 pages, with an

average viewing time of 4,57 minutes. • busiest day was December 3, with 1,566 visitors,

immediately after launch and media attention • Most traffic (65%) originated from 255 referrer

sites. The best referrer sites being: • pinkpop.nl (festival organization)• oor.nl (music magazine)• google

• users provided feedback more than 4000 times.• We are currently investigating how this

feedback can be exploited to improve automated multimedia analysis results

Page 44: Audiovisual content exploitation  JTS2010

Wrap up

• value of archive is strongly related to access opportunities

• access is to a large extend technology driven• but next to technology development we need

to make a shift:• from a ‘laboratory view’ on users to drawing

users and communities into the loop• NISV is aiming towards this two-way strategy:

• incorporate advanced access technology• discuss access requirements with the

stakeholders

Page 45: Audiovisual content exploitation  JTS2010