3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps
-
Upload
hoyt-mcknight -
Category
Documents
-
view
18 -
download
0
description
Transcript of 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps
![Page 1: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/1.jpg)
3D Articular Human Tracking from Monocular Video
From Condensation to Kinematic Jumps
Cristian SminchisescuBill Triggs
GRAVIR-CNRS-INRIA, Grenoble
![Page 2: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/2.jpg)
Goal: track human body motion in monocular video and estimate 3D joint motion
Why Monocular ?• Movies, archival footage • Tracking / interpretation of actions & gestures (HCI)• Resynthesis, e.g. change point of view or actor• How do humans do this so well?
![Page 3: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/3.jpg)
Why is 3D-from-monocular hard?
Image matching ambiguities
Depth ambiguities
Violations of physical constraints
![Page 4: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/4.jpg)
Overall Modelling Approach1. Generative Human Model
– Complex, kinematics, geometry, photometry
– Predicts images or descriptors
2. Model-image matching cost function– Associates model predictions to
image features– Robust, probabilistically motivated
3. Tracking by search / optimization– Discovers well supported
configurations of matching cost
![Page 5: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/5.jpg)
Human Body Model• Explicit 3D model allows high-level interpretation
• 30-35 d.o.f. articular ‘skeleton’
• ‘Flesh’ of superquadric ellipsoids with tapering & bending
• Model image projection maps points on ‘skin’ through
• kinematic chain• camera matrix• occlusion (z buffer)
![Page 6: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/6.jpg)
Parameter Space Priors• Anthropometric prior
• left/right symmetry• bias towards default human
• Accurate kinematic model• clavicle (shoulder), torso (twist)• robust prior stabilizes complex
joints
• Body part interpenetration• repulsive inter-part potentials
• Anatomical joint limits• hard bounds in parameter space
![Page 7: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/7.jpg)
Multiple Image Features, Integrated Robustly
1. Intensity• The model is `dressed’ with the image texture under its projection (visible parts) in the previous time step
• Matching cost of model-projected texture against current image (robust intensity difference)
![Page 8: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/8.jpg)
2.Contours
• Multiple probabilistic assignment integrates matching uncertainty • Weighted towards motion discontinuities (robust flow outliers)
• Also accounts for higher order symmmetric model/data couplings
• partially removes local, independent matching ambiguities
![Page 9: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/9.jpg)
Cost Function Minima Caused By Incorrect Edge Assignments
Intensity + edges
Edges
only
![Page 10: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/10.jpg)
How many local minima are there?
Thousands ! – even without image matching ambiguities …
![Page 11: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/11.jpg)
Tracking Approaches We Have Tried
• Traditional CONDENSATION
• Covariance Scaled Sampling
• Direct search for nearby minima
• Kinematic Jump Sampling
• ‘Manual’ initialization – already requires nontrivial optimization
![Page 12: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/12.jpg)
Properties of Model-Image Matching Cost Function, 1
• High dimension– at least 30 – 35 d.o.f.
– but factorial structure: limbs are quasi-independent
• Very ill-conditioned– depth d.o.f. often nearly unobservable
– condition number O( 1 : 104 )
• Many many local minima– O( 103 ) kinematic minima, times image ambiguity
![Page 13: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/13.jpg)
Properties of Model-Image Matching Cost Function, 2
• Minima are usually well separated– fair random samples almost never jump between
them
• But they often merge and separate– frequent passage through singular / critical
configurations – frontoparallel limbs
– causes mistracking!
• Minima are small, high-cost regions are large– random sampling with exaggerated noise almost
never hits a minimum
![Page 14: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/14.jpg)
Covariance Scaled Sampling, 1Mistracking leaves us in the wrong minimum.
To make particle filter trackers work for this kind of cost function, we need :
• Broad sampling to reach basins of attraction of nearby minima– in CONDENSATION : exaggerate the dynamical noise– robust / long-tailed distributions are best
• Followed by local optimization to reach low-cost ‘cores’ of minima– core is small in high dim. problems, so samples rarely hit it– CONDENSATION style reweighting will kill them before
they get there
![Page 15: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/15.jpg)
Covariance Scaled Sampling, 2
• Sample distribution should be based on local shape of cost function– the minima that cause confusion are much further in some
directions than in others owing to ill-conditioning
– in particular, kinematic flip pairs are aligned along ill-conditioned depth d.o.f.
• Combining these 3 properties gives Covariance Scaled Sampling
– long-tailed, covariance shaped sampling + optimization
– represent sample distribution as robust mixture model
![Page 16: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/16.jpg)
Statistical Separation of Minima
• Minima are usually at least O(101) standard deviations away.
![Page 17: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/17.jpg)
![Page 18: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/18.jpg)
Direct Search for Nearby Minima
• Instead of sampling randomly, directly locate nearby cost basins by finding the ‘mountain passes’ that lead to them– i.e. find the saddle point at the top of the path
• Numerical methods for finding saddles :– modified Newton optimizers : eigenvector tracking,
hypersurface sweeping
– ‘hyperdynamics’ : MCMC sampling in a modified cost surface that focuses samples on saddles
![Page 19: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/19.jpg)
Direct Search for Nearby Minima
Local minima
Saddle points
![Page 20: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/20.jpg)
Hypersurface Sweeping• Track cost minima on an expanding hypersurface• Moving cost has a local maximum at a saddle point
![Page 21: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/21.jpg)
Hyperdynamics
small abruptness large abruptness
small height
large height
![Page 22: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/22.jpg)
Examples of Kinematic Ambiguities
• Eigenvector tracking method • Initialization cost function (hand specified
image positions of joints)
![Page 23: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/23.jpg)
Kinematic Jump Sampling
• Generate tree of all possible kinematic solutions– work outwards from root of kinematic tree, recursively evaluating
forwards & backwards ‘flip’ for each body part– alternatively, sample by generating flips randomly – you can often treat each limb quasi-independently
• Yes, it really does find thousands of minima !– quite accurate too – no subsequent minimization is needed– random sampling is still needed to handle matching ambiguities
![Page 24: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/24.jpg)
Jump Sampling in Action
![Page 25: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/25.jpg)
Summary• 3D articular human tracking from monocular video
• A hard problem owing to – complex model (many d.o.f., constraints, occlusions…)– ill-conditioning– many kinematic minima– model-image matching ambiguities
• Combine methods to overcome local minima– explicit kinematic jumps + sample for image ambiguities
• Current state of the art– relative depth accuracy is 10% or 10 cm at best– tracking for more than 5 – 10 seconds is still hard – still very slow – several minutes per frame
![Page 26: 3D Articular Human Tracking from Monocular Video From Condensation to Kinematic Jumps](https://reader030.fdocuments.us/reader030/viewer/2022032709/56812f98550346895d951529/html5/thumbnails/26.jpg)
The End