Information Retrieval On Digital Video Information

21
The TREC2001 Video Track: Information Retrieval on Digital Video Information Alan F. Smeaton Centre for Digital Video Processing, Dublin City University, Ireland Paul Over National Institute for Standards and Technology, USA Cash J. Costello Applied Physics Laboratory, Johns Hopkins University, USA Arjen P. de Vries CWI, Amsterdam, The Netherlands David Doermann Laboratory for Language and Media Processing, University of Maryland, USA Alexander Hauptmann School of Computer Science, Carnegie Mellon University, USA Mark E. Rorvig School of Library and Information Sciences, University of North Texas, USA John R. Smith IBM T.J. Watson Research Center, USA Lide Wu Dept. of Computer Science, Fudan University, China

description

 

Transcript of Information Retrieval On Digital Video Information

Page 1: Information Retrieval On Digital Video Information

The TREC2001 Video Track:Information Retrieval on Digital Video

Information

Alan F. Smeaton Centre for Digital Video Processing, Dublin City University, Ireland

Paul Over National Institute for Standards and Technology, USA

Cash J. Costello Applied Physics Laboratory, Johns Hopkins University, USA

Arjen P. de Vries CWI, Amsterdam, The Netherlands

David Doermann Laboratory for Language and Media Processing, University of Maryland, USA

Alexander Hauptmann School of Computer Science, Carnegie Mellon University, USA

Mark E. Rorvig School of Library and Information Sciences, University of North Texas, USA

John R. Smith IBM T.J. Watson Research Center, USA

Lide Wu Dept. of Computer Science, Fudan University, China

Page 2: Information Retrieval On Digital Video Information

• TREC2001• TREC2001 Video Track• TREC2001 Video Track Tasks

– Shot Boundary Detection Task– Search Task

• Search Task• Participants in Search Task & Their Focus• Summary of approaches by participants• Conclusion

Presentation overview

2/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 3: Information Retrieval On Digital Video Information

• Annual activity (1992- ) to “benchmark the retrieval effectiveness of Information Retrieval tasks”

• Co-ordinator NIST (National Institute for Standards and Technology, US) defines & distributes:– Test document corpus– Topics (queries)

• Participating groups develop an IR system, run Topics against Test document corpus, sends the results to NIST

• NIST generate relevance assessments and calculate the performance in terms of precision & recall

• Annual conference in Gaithersburg, Maryland

TREC (Text REtrieval Conference)

3/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 4: Information Retrieval On Digital Video Information

• Different streams, introduced to focuses on a particular sub-problems in Information Retrieval

• 15 different “tracks” have been introduced, some stopped, some continuing, e.g:– Interactive track 1993-– Chinese language track 1995-1998– Web track 1998-– Question Answering track 1998-– Video track 2001-

“Tracks” in TREC

4/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 5: Information Retrieval On Digital Video Information

• 1st Video Track in 2001• Promote progress in content-based retrieval from

digital video via open, metrics-based evaluation

• 12 Participating groups (5 USA, 2 Asia, 5 Europe) - contributing definition of corpus, topics, task via discussion, and running of the track

• Following the TREC framework: NIST co-ordinated and provided:– Video document corpus– Topic queries

Video Track in TREC2001

5/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 6: Information Retrieval On Digital Video Information

• Video document corpus - total 11.2 hours (85 video files in MPEG-1 format; 6.3 Gbytes), mostly documentary nature, varying in age, style and quality e.g:

6/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

• “A New Horizon” (16 min; colour; documentary) - This Great Plains orientation tape explains the boundaries of the Great Plains Region which is one of five regions that make up the Bureau of Reclamation

• “Challenge at Glen Canyon” (26 min; colour; documentary) - Shows how the repairing of the spillway caused by flooding along the Colorado River System was conducted

Video Track in TREC2001

Page 7: Information Retrieval On Digital Video Information

• 74 Topics (queries) - with multimedia examples (audio/image/video) along with each topic, e.g:– Topic #8: “find clips showing the planet Jupiter”

(with 2 images depicting Jupiter)

– Topic #32: “find clips with a chopper landing”

(with 3 audio clips of a helicopter sound)

– Topic #54: “find clips showing Glen Canyon dam”

(with a short video clip showing Glen Canyon dam)

7/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Video Track in TREC2001

Number of topics 74

No. topics with image examples / Avg. number of images 26 / 2.0

No. topics with audio examples / Avg. number of audio 10 / 4.3

No. topics with video examples / Avg. number of videos 51 / 2.4

Page 8: Information Retrieval On Digital Video Information

• Two distinctive tasks:– Shot Boundary Detection task: engineering

exercise to evaluate the accuracy of automatically detecting camera shot boundaries in the video corpus

Tasks in Video Track in TREC2001

8/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

– Facilitates higher-level video indexing/browsing (e.g scene detection/navigation, news story segmentation…)

Video file

Camera shot

Page 9: Information Retrieval On Digital Video Information

9/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

• Two distinctive tasks:– Search task: running topic queries against

the video corpus, searching for the video segments that answer the queries

• Automatic• Interactive

– Answer segments are submitted to NIST for evaluation

Tasks in Video Track in TREC2001

Page 10: Information Retrieval On Digital Video Information

• Among 12 participating groups in the TREC2001 Video Track:– all 12 groups took part in the Shot Boundary Task– 8 groups took part in the Search Task

• Participants in Search Task:– Carnegie Mellon University, USA– Dublin City University, Ireland– Fudan University, China– IBM Research, USA– Johns Hopkins University, USA– Lowlands Group (Netherlands)– University of Maryland, USA– University of North Texas, USA

Participating Groups in Search Task

10/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 11: Information Retrieval On Digital Video Information

• Used Informedia Digital Video Library’s standard processing modules– Shot Boundary Detection (using color histogram comparison)– Keyframe extraction– Speech recognition (using Sphinx speech recogniser with 64,000 word

vocabulary)– Face detection– Video OCR– Image search based on color histogram features in different colour

spaces and textures

• Informedia interface for Interactive track, users allowed to switch between multiple image search engines

• Image retrieval augmented to process I-frames (not only keyframes)• Speaker identification component used to compare query audio

example to the audio in the retrieved video segment• Image retrieval & video OCR had the largest impact on performance

Carnegie Mellon University (USA)

11/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 12: Information Retrieval On Digital Video Information

• Using Físchlár Digital Video System

• Shot boundary detection & Keyframe extraction• Allowed users to browse through keyframes with different

browsing interfaces including:– Timeline browser (linear, spatial keyframe presentation)– Slide Show browser (linear, temporal keyframe presentation)– Hierarchical browser (hierarchical, spatial keyframe presentation)

• 30 test users (final year undergrads & research students) interacted with the system in controlled environment– 12 topic queries / user– 6 minutes / topic query– within-user setting (each user used all 3 browsers 4 times each, in

round robin fashion)

• Timeline browser allowed largest number of answer submissions, with lowest precision, Slide Show vice versa

Dublin City University (Ireland)

12/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 13: Information Retrieval On Digital Video Information

• Tried 17 topics including people searching, video text searching, camera motion, etc.)

• Feature extraction module:– qualitative camera motion analysis module– face detection/recognition module (skin color based

segmentation + motion/shape filtering, use of a new optimal discrimination criterion)

– video text detection/recognition module (vertical edge based methods to detect text blocks; improved logical level technique to binarize text blocks)

– speaker recognition / speaker clustering module– Speech SDK (Microsoft) to get transcript

• Off-line indexing followed by on-line searching

Fudan University (China)

13/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 14: Information Retrieval On Digital Video Information

• Members from IBM T.J. Watson Research Center & IBM Almaden Research Center

• Using IBM CueVideo System– Shot Boundary Detection & Keyframe extraction– MPEG-7 visual descriptors for indexing keyframes & answering

automatic searches– Statistical model for classifying & generating labels/scores for:

• events (fire, smoke, launch)• scenes (greenery, land, outdoors, rock, sand, sky, water)• objects (airplane, boat, rocket, vehicle, faces)

– Query/filter pipelines to cascaded content- & model-based searching, e.g “shots that have similar colour to this image, have label ‘outdoors’ and show a ‘boat’ ”

• Compared performance of content/module-based system vs. speech-based system: best results obtained by combining the two methods

IBM Research

14/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 15: Information Retrieval On Digital Video Information

• Automatic searching:– Keyframes are used for indexing by color histogram &

image texture– Query representation consisting of image & video

portion of information need– Similarity measure by weighting distance between

the image features of the query representation and the indexed keyframes: Shots with most similar keyframes associated are then retrieved.

Johns Hopkins University (USA)

15/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 16: Information Retrieval On Digital Video Information

• Joint group among database group of CWI, multimedia group of TNO, vision group of University of Amsterdam, language technology group of University of Twente

• Retrieval engine based on:– face detection– camera motion detection (pan, tilt, zoom)– monologue detection– video OCR detection

• System heuristically selected a set of filters based on the detectors by analysing the query text with WordNet

• Compared performance with Transcript-based (provided by CMU) system

• Transcript-based system outperformed features-based system

Lowlands Group (The Netherlands)

16/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 17: Information Retrieval On Digital Video Information

• Temporal Color Correlogram - to capture the spatio-temporal relationship of colors in a video shot

• Using MERIT system with VideoLogger video editing software (from Virage)

• Keyframe extraction (1st frame in the shot) => static image color correlogram calculation => temporal correlogram calculation (by shot segmentation in equal intervals, then shot features fed into CMRS retrieval system)

• TREC topic queries were translated into example videos/images

University of Maryland (USA)

17/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 18: Information Retrieval On Digital Video Information

• Keyframe extraction (frames every 5 seconds)• Redundant keyframe removal (to ensure presence of

frames outside the prescribed normal distribution limits)• Resulting keyframes placed into UNT’s Brighton Image

Searcher application (retrieval based on mathematical measures that correspond to primitive image features)

• 13 topics used by 2 members to retrieve relevant keyframes against topics

• Chosen keyframes were then used as an exemplar to find other keyframes similar to them.

• Precision scores were better than expected due to the human judgement presence

University of North Texas (USA)

18/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 19: Information Retrieval On Digital Video Information

• Varied approaches by different groups– Interactive searching vs. automatic searching– Speech recognition transcript vs. visual-only– Various combination of different features for retrieval– Experienced groups vs. new groups in video retrieval

• Performance (Precision) results varied greatly:– Interactive: Best group 0.6 - Worst group 0.23 (across same 31 topics)– Automatic: 0.609 - 0.002

• The video track was still shaping itself in 2001 & not complete– only small-scale comparisons possible (within-topic, between closely

related system variants)– cross-system comparison possible only after achieving better

consistency in topic formulation, agreement on better measures, larger numbers of data points)

• Difficulties & unforeseen problems highlighted, tackled in 2nd Video track in TREC2002

Summary & Analysis of Approaches

19/21

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 20: Information Retrieval On Digital Video Information

Conclusions

20/21

• Revealed lots of issues to be addressed in evaluating the performance of retrieval on digital video information

• There are groups working in this area worldwide who have the capability and the systems to support real information retrieval on significant volumes of digital video content

• 2nd Video Track (2002)– more than 20 participating groups– 68.5 hours of video document corpus– 25 focused set of topic queries– Tasks:

• Shot Boundary Detection - as before• Semantic feature extraction task (face, indoor/outdoor,

landscape/cityscape, speech/music/monologue, etc.)• Search - interactive or automatic as before

TREC2001 Video Track: Information Retrieval on Digital Video Information

Page 21: Information Retrieval On Digital Video Information

Conclusion

21/21

TREC2001 Video Track website with papers:

http://www-nlpir.nist.gov/projects/t01v/t01v.html

Authors’ Note: The authors wish to extend our sympathies to the family and friends of our co-author, Mark E. Rorvig, who passed away shortly

before this paper was submitted.

TREC2001 Video Track: Information Retrieval on Digital Video Information