DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid...

16
DVMM Lab, Columbia University Video Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia Lab Department of Electrical Engineering Columbia University http://www.ntu.edu.sg/home/dongxu [email protected] *Courtesy to Eric Zavesky for preparing for the slides

Transcript of DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid...

Page 1: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Video Event Recognition:Multilevel Pyramid MatchingDong Xu and Shih-Fu Chang

Digital Video and Multimedia LabDepartment of Electrical EngineeringColumbia University

http://www.ntu.edu.sg/home/[email protected]

*Courtesy to Eric Zavesky for preparing for the slides

Page 2: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Video Event Recognition: Problem• Online video search and video indexing

• Events characterized by an evolution of scenes, objects and actions over time

• 56 events are defined in LSCOM

Airplane Flying Car Exiting

Page 3: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Video Event Recognition: Challenges

• Geometric and photometric variances

• Clutter background

• Complex camera motion and object motion

Page 4: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Event Recognition: Object Tracking • Detect interest object, track over time, and model

spatio-temporal dynamics

• Hard to detect events without explicit object motion, such as Riot

Object Detection & Localization

Tracking Inference“ Airpla

ne Landing

?

Page 5: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Event Recognition: Key-Frame based Matching

• Only key-frame is used for matching.

• Low-level feature extraction, compare to other frames, overall decision on matching

...

...

Keyframe Feature

15%

18%

50%

Similarity

Page 6: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

multi-level multi-level pyramid pyramid matchingmatching

multi-level multi-level pyramid pyramid matchingmatching

Event Recognition: Multi-level Pyramid Matching

feature feature extractionextraction

feature feature extractionextraction

concept concept detectorsdetectorsconcept concept

detectorsdetectorsEMDEMD

distancedistanceEMDEMD

distancedistance

...

...

XX

Page 7: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Content Representation: Low-level Features

edge directionhistogramgrid color

moment

Gabortexture

Page 8: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

• Train detectors on low-level features

• Mid-level semantic concept feature is more robust

• Developed and released 374 semantic concept detectors

Concept Detectors

Content Representation: Mid-level Semantic Concept ScoresImage Database

+-

Page 9: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Earth Mover’s Distance (EMD): Approach

dij

Supplier P is with a given amount of goods

Receiver Q is with a given limited capacity

Weights: Solved by linear programming

•Temporal shift: a frame at the beginning of P can be mapped to a frame at the end of Q•Scale variations: a frame from P can be mapped to multiple frames in Q

111/21/2

1/21/2

Page 10: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Multi-level Pyramid Matching: Motivations

• One Clip = several subclips (stages of event evolution)

• No prior knowledge about the number of stages in an event

• Videos of the same event may include only a subset of stages

Solution: Multi-level Solution: Multi-level pyramid matching in pyramid matching in

temporal domaintemporal domain

Page 11: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

•Fusion of information from different levels.

•Alignment of different subclips (Level-1 as an example)

EMD DistanceMatrix between

Sub-clips

Integer-valueAlignment

Smoke Fire

Smoke

Level-0 Level-0

Level-1

Level-1

Level-1

Level-1

•Temporally Constrained Hierarchical Agglomerative Clustering

Fire

Multi-level Pyramid Matching: Algorithm

Level-2

Level-2

Level-2

Level-2

Page 12: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Pyramid Matching: Projected Illustration

First stage of shot 1

Second stage of shot 1

First stage of shot 2

Second stage of shot 2

Negative shots

Page 13: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Experiments: Keyframe based feature performance

Dataset: TRECVID2005Evaluation Metric: Average Precision

Page 14: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Experiments: EMD concept performance

Page 15: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Experiments: Benefits of multi-level pyramid fusion

Page 16: DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

DVMM Lab, Columbia University Video Event Recognition

Single-level EMD outperforms key-frame based method. Multi-level Pyramid Matching further improves event detection accuracy.

First systematic study of diverse visual event recognition in the unconstrained broadcast news domain.

Video Event Recognition: Conclusions