Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research...
-
date post
22-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research...
![Page 1: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/1.jpg)
IntelIntel Research Research
Interactive Event Detection in Video and Audio
Rahul Sukthankar
Intel Research Pittsburgh &Carnegie Mellon University
![Page 2: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/2.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Contributors
• Diamond team: L. Huston, Satya, L. Mummert, C. Helfrich, L. Fix
• Forensic video retrieval:J. Campbell, P. Pillai, Diamond team
• Volumetric video analysis: Y. Ke, M. Hebert
• Sound object detection in soundtracks:D. Hoiem, Y. Ke
• Interactive search-assisted diagnosis for breast cancer:Y. Liu, R. Jin, B. Zheng, D. Jukic
![Page 3: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/3.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Why Interactive Event Detection?
• Events of interest are often not known a priori– Data exploration: “find me more things like this”
• User’s requirements change based on partial results– Surveillance: “Alert me if you see X… hmm… actually I want Y”
• Challenges:– Limited training data
• can we still learn good event detectors?
– Efficiency• how best to organize/index/pre-process the data?
![Page 4: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/4.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Outline
• Event detection in audio– sound object detection from a few examples
• Diamond– efficient search of non-indexed data
• Event detection in video– forensic video surveillance– volumetric analysis for action detection
![Page 5: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/5.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Example: Sound Object Detection
• Applications of sound object detection– “Alert me if you hear a gunshot.” (monitoring)
– “Fast forward to the next swordfight in LotR” (search and retrieval)
• Approach:– Learn boosted classifier from ~5-10 examples of the object
– Scan windowed classifier over all possible locations
Audio stream
…
Clip 1
Clip N
Clip Classifier
Classify each clip as object or non-object
Return locations of detected sound object
[D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]
![Page 6: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/6.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Sound Object Detection: Clip Classifier
• Feature extraction
• Weak classifier – small decision trees on features
• Learn classifier cascade using Adaboost …
138 Features
Decision nodes
Leaf Nodes
[D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]
![Page 7: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/7.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Sound Object Detection: Results
Best Performance
WorstPerformance
stage 1 stage 2 stage 3
pos neg pos neg pos neg
meow 0.0% 1.4% 0.0% 1.2% 2.2% 0.8%
phone 0.0% 0.4% 4.3% 0.1% 5.9% 0.0%
car horn 0.0% 3.9% 0.6% 2.2% 3.6% 1.3%
door bell 1.4% 2.1% 2.1% 0.4% 6.3% 0.1%
swords 6.1% 1.3% 6.7% 0.1% 6.7% 0.0%
scream 0.3% 5.5% 2.7% 1.4% 5.3% 1.1%
dog bark 0.7% 1.0% 6.0% 0.3% 7.7% 0.2%
laser gun 0.0% 6.8% 4.4% 5.1% 6.7% 0.9%
explosion 4.1% 5.2% 7.5% 1.5% 12.0% 0.5%
light saber 4.8% 6.8% 9.7% 1.0% 13.9% 0.2%
gunshot 8.1% 6.1% 12.5% 2.3% 14.5% 1.1%
close door 7.9% 7.8% 14.5% 4.8% 17.6% 2.3%
male laugh 4.3% 14.7% 9.5% 9.7% 13.3% 7.0%
average 2.9% 4.4% 6.0% 2.2% 8.5% 1.1%
![Page 8: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/8.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Framework for Interactive Event Detection
• Interactive event detection =?= non-indexed search
• Search and indexing:– If queries can be predicted in advance, indexing is possible
(e.g., Google for text data)– Alternative is brute-force search through non-indexed data
• How to perform efficient non-indexed search?– May need to execute arbitrary code (learned event detector)
![Page 9: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/9.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Brute-Force Search
• Event detection: vast majority of the data is useless• BFS scales poorly with storage volume
discard
results
Search app Storage
query
User
![Page 10: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/10.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Diamond: Early Discard
• Reject as close to storage as possible• Reduce volume of data transferred• Scales much better!
Search app Storage
query
User
results
late discard
query’
early discard
![Page 11: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/11.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
As
so
c D
MA
Storage Runtime
Searchlet
Filter API
Diamond code (open)
App Code (proprietary or open)
Diamond API (open)
Storage access protocol (open)
As
so
c D
MA
Storage Runtime
Searchlet
Filter API
As
so
c D
MA
Storage Runtime
Searchlet
Filter API
Diamond is a collaborative projectbetween Intel Research & CMU
SearchApplication H
ost
ru
nti
me
Sea
rch
let
AP
I
Ass
oc
DM
A
Lin
ux
Diamond Architecture
![Page 12: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/12.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Anatomy of a Diamond Searchlet• Sequence of partially-ordered “filters”
– each filter can pass or drop an object– filters share state through attributes
• Diamond determines an optimal filter order
![Page 13: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/13.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Example Application: Forensic Video Surveillance
• Timely reconstruction of a crime scene – large quantities of video surveillance data– current practice: gather & manually scan video tapes– obvious optimization: transfer data to central site
• Better solution: send your detector to the data
cam
Host
Appcam
cam
cam
cam
[J. Campbell et al., VSSN 2004]
![Page 14: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/14.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Video Action Detection: Goal
![Page 15: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/15.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Idea: Treat Video as a Volume
TX
Y
![Page 16: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/16.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Related work: Recognition usingSVMs on Space-Time Interest Points
Space-time interest points
Figures courtesy: [Schuldt et al., ICPR 2004]
![Page 17: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/17.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Problem with Space-Time Interest Points:Too Sparse
Two examples of smooth motions where no stable space-time interest points are detected.
![Page 18: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/18.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Volumetric Features on Optical Flow
![Page 19: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/19.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Our Features: 3D Extension of Viola-Jones
TX
Y
TX
Yab
cd
efgh
Volumetric features Integral Volume
(x, y, t)
TX
Y
Volumetric features can be efficiently computed using integral volumes, with only 8 memory accesses per feature. The sum of the volume ise – a – f – g + b + c + h – d.
![Page 20: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/20.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Classifier cascade learned usingDirect Feature Selection, Wu et al., NIPS, 2002
An example of the features learned by the classifier to recognize the hand-wave action in a detection volume
Millions of potential features for selection, so Adaboost is too slow.
TX
Y
![Page 21: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/21.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Detection
• Use a sliding volume over video sequence• Model true event as a cluster of detections with
Gaussian distribution.
![Page 22: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/22.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Generic Volumetric Features
• Processing non-indexed video is slow – lots of data• Are there application-independent representations for video?• Goal: pre-process video once, support multiple video event apps.
[Y. Ke, unpublished 2006]
![Page 23: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/23.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Related work:Space-Time Behavior Based Correlation
Figures courtesy: [Shechtman & Irani, CVPR 2005]
![Page 24: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/24.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Interactive Search-Assisted Diagnosis
suspiciousmass (query)
Rank1: benignbiopsy
Rank2: benignbiopsy
Rank3: malignantbiopsy
ISAD Results
Collaborators:Collaborators:B. Zheng, D. Jukic, L. Yang, R. JinB. Zheng, D. Jukic, L. Yang, R. Jin
CLOSE?
![Page 25: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/25.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Query-adaptive Local Distance Learning
• Previously:– Various Lp norms: Euclidean distance is typically not the best– Global metric learning:
• Learn metric that best satisfies user-given pairwise data constraints
• Fares poorly with multimodal data
– Local metric learning:• Learn metric that does above, but weighs nearby constraints higher
• Chicken & egg problem
• What’s new:– Learn a metric for the given query based on neighborhood
![Page 26: Intel Research Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d805503460f94a644e6/html5/thumbnails/26.jpg)
IntelIntel Research Research Rahul Sukthankar – ICML2006 Workshop
Summary
• Many real applications require interactive event detection• Good for ML algorithms that:
– operate with limited training data – train quickly/incrementally– exploit unlabeled data
• Diamond – infrastructure for efficient non-indexed searchhttp://diamond.cs.cmu.edu/
• Interactive event detection in video is still painful– Good general-purpose representation for event detection?