Video Analysis Video Search System - 國立臺灣大學winston/papers/tv05.poster.g.ppt.pdf ·...

1
Columbia University Video Search System Evaluation Shih-Fu Chang, Winston Hsu, Lyndon Kennedy, Akira Yanagawa, Eric Zavesky, Dong-Qing Zhang, in collaboration with IBM Research http://www.ee.columbia.edu/dvmm Text+Story+Anchor Removal +cue-X Reranking+CBIR +Concept Search 0.114 Text+Story+Anchor Removal +cue-X reranking+CBIR 0.111 Text+Story+Anchor Removal +cue-X reranking 0.107 Text+Story+Anchor Removal 0.095 Text+Story 0.087 Text 0.039 Components (in automatic search TRECVID 2005) MAP Objectives and Unique Features Fusion of detection and searching techniques at different levels across modalities Combination of parts-based graph model and fixed-feature classifiers for concept detection Information-theoretic Cue-X clustering for story segmentation and search re-ranking Semantic search over multi-modal metadata (ASR/MT and semantic concepts) Automatic discovery of query topic classes and user search patterns http://www. ee . columbia . edu / cuvidsearch Video Analysis Content Extraction automatic story segmentation video, speech, text near-duplicate detection concept detection feature extraction (text, video, prosody) concept search text search Image matching story browsing Near-duplicate search Interactive search automatic/manual search cue-X re-ranking mining query topic classes user search pattern mining Cue-X Information-theoretic Framework Robust story segmentation over TRECVID 05 data, F1=0.87 (ARB), 0.84 (CHN), 0.52 (ENG) w/o using text Cue-X discovers optimal relevant clusters and re-orders the search results. Shown effective in video search. Parts-Based Detector for Objects & Near-Duplicates Near-duplicates Concept Search and Query expansion jennings tonight text audio Person: 99% Clinton: 90% US Flag: 80% Podium: 70% Give speech: 75% Press conference: 70% “Find shots of Bill Clinton speaking with a US flag visible behind him.” Query Term extraction Query expansion (Web, WordNet etc) Metadata Index Concept detection: visual Populate visual concepts over a large lexicon Shots represented as smoothed concept vectors Match text queries to concept vectors by incorporating semantic relevance (WordNet) and detector robustness Find Event D Find Event E Find Object F Find Object G Automatic Joint semantics- performance grouping Find Person A Find Person B Find Person C Query Semantics Search Performance Manually defined query classes Automatic Discovery of Query Classes Learn distinct query topic classes by clustering query semantics and search performances jointly Parts-based statistical graph models effective for detecting objects/scenes with dominant spatio-feature cues. Improvement of 10% when fused with fixed-feature classifiers. Near-duplicate browser as most effective tool in our interactive search. Tested over 170+ hours of multi- lingual news video, TRECVID 2005 Story-based searches improves > 100% accuracy in automatic search Near-duplicate browsing doubles the relevant shots in interactive search Cue-X re-ranking and concept search proved very useful Web-based interactive search engine deployment Standard database (MySQL) and modular processing add-ons Personal user profiles and activity logs Columbia on-line Video Search Engine System/Demo (close to the top performance of automatic search) Video Text Audio Key: Cue-X clusters automatically discovered via Information Bottleneck principle & Kernel Density Estimation Y= story boundary Y=“demonstration” low-level features Information Bottleneck principle Y= search relevance

Transcript of Video Analysis Video Search System - 國立臺灣大學winston/papers/tv05.poster.g.ppt.pdf ·...

Page 1: Video Analysis Video Search System - 國立臺灣大學winston/papers/tv05.poster.g.ppt.pdf · Video Search System Evaluation Shih-Fu Chang, Winston Hsu, Lyndon Kennedy, Akira Yanagawa,

Columbia UniversityVideo Search System

Evaluation

Shih-Fu Chang, Winston Hsu, Lyndon Kennedy, Akira Yanagawa, Eric Zavesky, Dong-Qing Zhang, in collaboration with IBM Research

http://www.ee.columbia.edu/dvmm

Text+Story+Anchor Removal

+cue-X Reranking+CBIR

+Concept Search

0.114

Text+Story+Anchor Removal

+cue-X reranking+CBIR

0.111

Text+Story+Anchor Removal

+cue-X reranking

0.107

Text+Story+Anchor Removal0.095

Text+Story0.087

Text0.039

Components

(in automatic search TRECVID 2005)

MAP

Objectives and Unique Features Fusion of detection and searching techniques at different levels across modalities Combination of parts-based graph model and fixed-feature classifiers for concept detection Information-theoretic Cue-X clustering for story segmentation and search re-ranking Semantic search over multi-modal metadata (ASR/MT and semantic concepts) Automatic discovery of query topic classes and user search patterns

http://www.ee.columbia.edu/cuvidsearch

Video AnalysisContent Extraction

automaticstory

segmentation

video,speech,

text

near-duplicatedetection

concept detectionfeature

extraction (text,video, prosody)

concept search

text search

Image matching

storybrowsing

Near-duplicatesearch

Interactivesearch

automatic/manualsearch

cue-Xre-ranking

mining querytopic classes

user searchpatternmining

Cue-X Information-theoretic Framework

• Robust story segmentation over TRECVID 05 data,F1=0.87 (ARB), 0.84 (CHN), 0.52 (ENG) w/o using text• Cue-X discovers optimal relevant clusters andre-orders the search results. Shown effective in videosearch.

Parts-Based Detector for Objects & Near-DuplicatesNear-duplicates

Concept Search and Query expansion

jenningstonighttext

audio

Person: 99%

Clinton: 90%US Flag: 80%

Podium: 70%

Give speech: 75%Press conference: 70%

“Find shots of Bill Clintonspeaking with a US flagvisible behind him.”

Query

Term extractionQuery expansion

(Web, WordNet etc)

MetadataIndex

Concept detection:

visual

• Populate visual concepts over a large lexicon• Shots represented as smoothed concept vectors• Match text queries to concept vectors by

incorporating semantic relevance (WordNet) anddetector robustness

Find Event D

Find Event E

Find Object F

Find Object G

AutomaticJointsemantics-performancegrouping

Find Person A

Find Person B

Find Person C

Query Semantics Search Performance

Manuallydefinedqueryclasses

Automatic Discovery of Query Classes

• Learn distinct query topic classes by clustering querysemantics and search performances jointly

• Parts-based statistical graph modelseffective for detecting objects/sceneswith dominant spatio-feature cues.

• Improvement of 10% when fusedwith fixed-feature classifiers.

• Near-duplicate browser as mosteffective tool in our interactivesearch.

• Tested over 170+ hours of multi-lingual news video, TRECVID 2005

• Story-based searches improves> 100% accuracy in automaticsearch

• Near-duplicate browsing doublesthe relevant shots in interactivesearch

• Cue-X re-ranking and conceptsearch proved very useful

• Web-based interactivesearch engine deployment

• Standard database(MySQL) and modularprocessing add-ons

• Personal user profilesand activity logs

Columbia on-line Video Search Engine

System/Demo

(close to the top performance of automatic search)

VideoTextAudio

Key:

Cue-X clusters automatically discoveredvia Information Bottleneck principle &Kernel Density Estimation

Y= story boundary

Y=“demonstration”

low-level features

… …

Information Bottleneck principle

Y= search relevance