2. Our Framework
2.1. Enforcing Temporal Consistency by Post Processing Human Detection from Yang and Ramanan
[1] Articulated Pose Estimation using Flexible Mixtures of Parts.
Human Detection in Videos using Spatio-Temporal Pictorial Structures Amir Roshan Zamir, Afshin Dehghan, Ruben Villegas
University of Central Florida
1. Problem Human Detection in Videos:
Making Human Detection in Videos more Accurate. Possibility of Numerous False Detections.
Applications: Video surveillance, Human Tracking, Action Recognition, etc.
5. Conclusion6. Temporal Part Deformation Improves Human Detection in Videos Based on
Our Experiments. Less False Detections and More True Detections. Part Trajectories are More Precise.
3. Learning Transition of Parts Human Body Parts Have a Set Range of Motion that
Can Be Approximated. These Movements(Trajectories) Can Be Learned by
Training on the Annotated Dataset. We will Use the HumanEva Dataset [2] for Our
Training.
1.1 Our Approach Using Temporal Information (Transition of Human Parts in
Pictorial Structures). False Detections Should Be Temporally Less Consistent than
True Detections. Human Parts Transition Should Convey Information Which is
Ignored in the Frame-By-Frame Scenario.
2.2. Enforcing Temporal Consistency by Embedding them into the Detection Process
Our Contribution: Extending Spatial Pictorial Structures to Spatio-
Temporal Pictorial Structures.
Temporal Deformation Cost
1 2 3Frame Number :
ii
i
Configuration of parts
Appearance Spatial Deformation Cost
More Elegant Approach than Post Processing(2.1). Best Detections Are Determined During the
Optimization Process. Configuration of Parts are Limited to Transitions in
Time (Temporal Deformation).
This Transitions will Be Learned and Embedded in Our Optimization Process to Restrict the Detections.
Part’s Trajectories Before Temporal Adjustment
Part’s Trajectories After Temporal Adjustment
Next Steps Applying the Temporal Deformation Cost in the
Optimization Process. Train a Model that Considers Usual Human Part
Transitions in Time.
Part’s Trajectories on Video
Parts Trajectories of Annotations
Head Trajectory Before Temporal Adjustment
Head Trajectory After Temporal Adjustment
Annotated Parts in each frame
Head Trajectories Comparison
Input Frame
Compute Human
Detection
Pick a Bundle of n Frames
Check Part Transition in the Bundle of
Frames
Keep Frames that Move
Consistently in Time
Refine Part Location using
Temporal Information
Input Frame
Immediate Output from Human Detection
Temporally Consistent Detection without Part Adjustment
Temporally Consistent Detection with Part Adjustment
4. Results5. Videos Taken From TRECVID MED11 Dataset.
Human Detection Output
Human Detection Output with Temporal Consistency
Human Detection Output with Temporal Consistency and Part Adjustment
Human Detection Output with Temporal Consistency and Part Adjustment
Human Detection Output with Temporal Consistency
Human Detection Output
Input Frame
Input Frame
References[1] Y. Yang, D. Ramanan. “Articulated Pose Estimation using Flexible Mixture
of Parts” Computer Vision and Pattern Recognition (CVPR) Colorado Springs, Colorado, June 2011
[2] L. Sigal, A. O. Balan and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulate Human Motion, International Journal of Computer Vision (IJCV), Volume 87, Number 1-2, pp. 4-27, March, 2010.
Top Related