T-CNN, Object Detection from Video
-
Upload
universitat-politecnica-de-catalunya -
Category
Technology
-
view
941 -
download
0
Transcript of T-CNN, Object Detection from Video
![Page 1: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/1.jpg)
T-CNNObject Detection from Video
Kang, Kai and Ouyang, Wanli and Li, Hongsheng and Wang, Xiaogang
CVPR 2016
[arxiv] [code]
Slides by Andrea Ferri ([email protected])Computer Vision Reading Group @ UPC BarcelonaTech (Spring 2016)
![Page 2: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/2.jpg)
Summary:
•Introduction;•Architecture;
I. Still-Image Detection;II. MCS & MGP;
III. Tubelet Re-Scoring;
•Experiment.
![Page 3: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/3.jpg)
Introduction:
DET & VID challenges
are strongly DIFFERENT.
DET applied to VID has:→ Large Temporal Fluctuations→ Generate False Positives
![Page 4: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/4.jpg)
![Page 5: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/5.jpg)
T-CNN means:Tubelets - Convolutional
Neural Network Where Tubelets are:
Bounding Box Sequences Having:• Temporal Information;• Contextual Information.
![Page 6: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/6.jpg)
Architecture:
T-CNN is a composition of nowadays State of the Art:• Still-Image Object Detection;• Object Tracking Algorithm;• A Lot of Cool Tricks.
![Page 7: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/7.jpg)
![Page 8: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/8.jpg)
I. Still-Image DetectionThe used Detectors are:•DeepID-Net (Improvement of R-CNN);•CRAFT (Extension of Fast R-CNN).Both use different Region Proposal pre-trained models and training strategies.
![Page 9: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/9.jpg)
II. MCS & MGPMulti-Context Suppression
![Page 10: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/10.jpg)
Multi-Context Suppression
→ Sort all detection scores of all proposals in a video in descending order
→ The classes of the high rankings are denoted as the confident
→ The scores of classes with low rankings are suppressed, while the scores of confident classes remain unchanged.
![Page 11: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/11.jpg)
Motion-Guided Propagation
![Page 12: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/12.jpg)
Motion-Guided Propagation
→ In each frame, some objects are not found by detector. However, detections on adjacent frames are complementary to each other;
→Detections are propagated to adjacent frames. Optical flow is used for guiding the propagation;
→Propagation results in redundant boxes, which can be easily handled by non- maximum suppression (NMS).
![Page 13: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/13.jpg)
III. Tubelet Re-Scoring
1.High Confidence Tracking;
2.Spatial Max Pooling;
3.Temporal Re-Scoring.
![Page 14: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/14.jpg)
High Confidence Tracking
1 → Obtain detection results from still-image detectors;
2 → Choose high-confidence detections as starting points (anchors) for tracking;
3 → Obtain tubelets, which are bounding box sequences generated from tracking algorithms.
![Page 15: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/15.jpg)
Spatial Max Pooling
- Still-image detection results that have large overlaps with tubelet boxes are chosen for each tubelet;
- Only detections with maximum detection scores are left after spatial max-pooling;
Used the Kalman Filter to smooth the bounding box locations.
![Page 16: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/16.jpg)
Temporal Re-Scoring
• Tubelet Classification. Classify tubelets based on statistics of detection scores (mean, median, top-k). A linear classifier is learnt based on the statistics;
• Tubelet Re-scoring. Map detection scores of positive tubelets to [0.5, 1], negative ones to [0, 0.5].
Used a Bayesian Classifier.
![Page 17: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/17.jpg)
Experiments:
•Tricky work behind Dataset for training (Dataset Ratio 2:1=DET:VID);•Main Parameters:•MGP: 7 Frames;•MCS: 0,0003 Top classes of Boxes;
![Page 18: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/18.jpg)
Results:
![Page 19: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/19.jpg)
![Page 20: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/20.jpg)
![Page 21: T-CNN, Object Detection from Video](https://reader034.fdocuments.us/reader034/viewer/2022051123/58f1df071a28ab9e428b456d/html5/thumbnails/21.jpg)
Reference:
• T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos : Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, and Wanli Ouyang.
Andrea Ferri, [email protected]