Hand Pose Estimation - ETH Z
Transcript of Hand Pose Estimation - ETH Z
![Page 1: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/1.jpg)
Matthew Krenik Advisor: Fabrizio Pece
Hand Pose Estimation
![Page 2: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/2.jpg)
§ What is Hand Pose Estimation?
§ Why does it matter?
§ How does it work?
§ What has been done?
2
Agenda
![Page 3: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/3.jpg)
§ Estimate full Degree of Freedom (DOF) of a hand from depth images
§ This is a tough problem, especially to perform in real time! § Not to be confused with “hand shape estimation”
3
What is Hand Pose Estimation?
![Page 4: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/4.jpg)
4
![Page 5: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/5.jpg)
§ More than just gestures § Ideal for continuous
input applications § Links your hand
dexterity into a computer model
§ Will it redefine how we interact with computers??
5
Why Does it Matter?
![Page 6: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/6.jpg)
6
Gaming
![Page 7: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/7.jpg)
7
Design / Engineering
![Page 8: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/8.jpg)
8
Robot Hand Control– Surgery? Industry?
![Page 9: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/9.jpg)
9
Communication – Sign Language
![Page 10: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/10.jpg)
§ Its going to take some time to explain
§ Starting from the ground up! § Decision trees § Ensemble techniques § Random forests § Body Pose estimation § Hand Pose Estimation
§ Assumption is that everyone has a very basic idea of what machine learning is and does
10
How Does it Work?
![Page 11: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/11.jpg)
§ Goal: § Given training data T with entries (𝒙, 𝒚) § Find a model that estimates 𝒚 for unseen 𝒙 § This is called prediction
§ Quality Measurement: § Minimize the probability of model prediction errors on future data
§ What are some models? § Linear Regression § Support Vector Machines § Decision Trees!
11
Machine Learning
![Page 12: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/12.jpg)
§ Very intuitive § Each node asks a question
about a feature of the data § Propagates through the tree
depending on the answer to each question
§ When algorithm gets to the end, the decision tree makes a classification
12
Decision Trees
![Page 13: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/13.jpg)
§ In what order do we ask the questions (test features)? § Each possible tree has an amount of entropy § Test out all possible questions for a node, and choose the one
that reduces the entropy the most (largest information gain)
§ How do nodes make decisions based on the features? § Same way! § Choose a decision boundary that gives the largest information
gain
13
How to grow a tree from data?
![Page 14: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/14.jpg)
14
How to grow a tree from data?
![Page 15: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/15.jpg)
15
Decision Trees: A Pretty Good Model!
![Page 16: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/16.jpg)
§ Two competing methodologies: § Traditional: Build one really good model § Ensemble: Build many models and average the results
§ Build a ton of “pretty good” models § Combine them into one “pretty awesome” prediction! § Important for individual models to not be correlated,
otherwise there is a strong tendency to overfit § So we add randomness!
16
Ensemble Learning
![Page 17: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/17.jpg)
§ Bootstrap Aggregation (Bagging) § Take a random subsample from the training set T, with replacement § Train each model on a different subsample § Classification is the majority vote; Regression is the average
§ Random Forests: Multiple, randomized decision trees 1. Bagging 2. Randomized Node Optimization: choose random set of questions
§ Number of questions affects the correlation of the trees 3. Decision boundary of the decision trees: conic, linear, etc. 4. Depth of the component decision trees
§ More depth means there will be more overfitting 17
Ensemble Techniques
![Page 18: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/18.jpg)
18
Example: Different Trees
![Page 19: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/19.jpg)
19
Example: Different Trees
![Page 20: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/20.jpg)
20
Example: Different Trees
![Page 21: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/21.jpg)
21
Example: Random Decision Forest
![Page 22: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/22.jpg)
22
Example: Multi-class Decision Trees
![Page 23: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/23.jpg)
23
Example: Comparison to SVM Model
![Page 24: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/24.jpg)
24
A quick look at body pose estimation
§ Body Pose Estimation Pipeline § Technology found in consumer devices, like the Kinect § Very similar to hand pose estimation
![Page 25: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/25.jpg)
25
Hand Pose Estimation Pipeline
![Page 26: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/26.jpg)
§ Hand is much smaller than the body, but still has 22 DOF § Self occlusion is very common and severe § Can be rotated in any direction (body is always upright) § Real depth data can be difficult to label
26
What makes Hand Pose tough?
![Page 27: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/27.jpg)
§ Restrict the viewing area of the hand § One Advantage: Hands are fairly invariant among humans § Train with synthetic data, rendered from 3D models
27
Some ideas..
![Page 28: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/28.jpg)
§ Use 3D hand models to generate data
§ Train the Random Decision Forests using this data
28
Train based on Synthetic Data
![Page 29: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/29.jpg)
29
Hand Pose Estimation Pipeline
![Page 30: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/30.jpg)
30
Pixel Classification
One Tree Two Trees Three Trees
![Page 31: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/31.jpg)
§ Algorithm used to determine where the joints are
§ Each pixel is given a weighted Gaussian kernel § Weight is determined by class probability times depth § Gradient ascent from many points finds the local maxima § Highest local maxima determines the joint § Threshold the scores to filter out non-visible joints
31
Mean shift local mode finding
![Page 32: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/32.jpg)
32
Joint Determination
![Page 33: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/33.jpg)
Strengths § Very fast § Robust to fast movements and noise § No initialization needed § Can run on a GPU for interface applications or games
Issues § Training must be done offline § Number of images ~1-10M, takes 25-250 GB of data § Number of operations is huge even with simple algorithm
33
Hand Pose Estimation Algorithm
![Page 34: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/34.jpg)
§ Difficult to generate every possible hand pose § Dataset size is huge! § Hard to capture the variation in the data set § More variation à deeper trees à more RAM/memory
§ Solution: Divide into sub problems and solve with separate RDFs
§ Lower variation à lower complexity à less RAM/memory
34
Limitations of Single Layer RDF
![Page 35: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/35.jpg)
35
![Page 36: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/36.jpg)
36
Multi-layered RDFs for Hand Pose
![Page 37: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/37.jpg)
§ Local Expert Network § Hand Shape Classification gives each pixel a label § Train local expert forests for each pixel label § Expert forest depends on pixel label; each pixel is classified
§ Global Expert Network § Hand Shape Classification gives each pixel a label § The hand shape is determined by pixel voting § Train global expert forests for each pixel label § Expert forest depends on hand shape label; each pixel is classified
37
Two Structures of Multi-layer RDFs
![Page 38: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/38.jpg)
38
Local Expert Network
![Page 39: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/39.jpg)
39
Global Expert Network
![Page 40: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/40.jpg)
§ Given the same data as before (hand shape not given)
1. Cluster the data 2. Train Hand Shape Classifier based on all clusters 3. Train each Pixel Classifier based on a specific cluster
40
Training a Multi-layer RDF
![Page 41: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/41.jpg)
§ Global Expert Networks average class distributions à More robust to noise
§ Local Expert Networks use info from each pixel à
Better at generalizing unseen data
41
Which is better? GEN or LEN
![Page 42: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/42.jpg)
42
Test: American Sign Language
![Page 43: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/43.jpg)
§ Huge improvement over single-layer RDFs
43
Results
![Page 44: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/44.jpg)
§ Remaining errors are concentrated on very similar poses
44
Results
![Page 45: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/45.jpg)
§ What is Hand Pose Estimation? Determine the joint positions to fix all DOFs of the hand
§ Why does it matter? Continuous Input Applications
§ How does it work? Randomized Decision Forests
§ What has been done? Add multiple layers for increased performance.
45
Summary
![Page 46: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/46.jpg)
§ [1] Keskin- Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests
§ [2] Thompson-Real Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
§ [3] Qian- Realtime and Robust Hand Tracking from Depth § [4] Tang- Latent Regression Forest: Structured Estimation of 3D Articulated
Hand Posture § [5] Oikonomidis - Evolutionary Quasi-random Search for Hand Articulations
Tracking § [6] Wang - 6D Hands: Markerless Hand Tracking for Computer Aided Design § [7] Hilliges - Advanced topics in Gesture Recognition Part II
46
References
![Page 47: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/47.jpg)
47
Questions?
![Page 48: Hand Pose Estimation - ETH Z](https://reader031.fdocuments.us/reader031/viewer/2022020912/62023db813ecf44dc4466f77/html5/thumbnails/48.jpg)
§ Hand shape is just shape information “fist”, “flat”, etc. § Hand pose is specific joint angles for every DOF
§ With hand pose, can use SVM to determine hand shape very robustly
48
Appendix: Getting Hand Shape from Hand Pose