Deep Learning for Articulated Human Pose Estimation
Transcript of Deep Learning for Articulated Human Pose Estimation
![Page 1: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/1.jpg)
End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks
for Human Pose Estimation
Wei Yang, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang
Department of Electronic Engineering, The Chinese University of Hong Kong
![Page 2: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/2.jpg)
Outline
Introduction
Methodology
Experiments
2
![Page 3: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/3.jpg)
Human Pose Estimation What is Human Pose Estimation?
3Results are generated by the proposed methods (without temporal constraints).Video Credit: Peter Jasko solo - M-idzomer 2013
![Page 4: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/4.jpg)
Human Pose Estimation is to recover the joint positions of articulated limbs of human body given an image or a video.
4Results are generated by the proposed methods (without temporal constraints).Video Credit: Peter Jasko solo - M-idzomer 2013
![Page 5: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/5.jpg)
Activity Recognition Game and Animation Clothing Parsing
Applications
5
[Yamaguchi et al. CVPRβ14]
![Page 6: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/6.jpg)
Challenges Articulation
Forshortening
Clothing
6
Dantone et al. CVPR 2013
![Page 7: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/7.jpg)
Structure Modeling
7
Figure Drawing
β’ Cylinders for each body parts
β’ Join up the cylinders
Pictorial Structures
β’ Unary templates
β’ Pairwise springs
Unable to handle large variations
Image credit:artintegrity.wordpress.com
Fischler & Elschlager 1973
Felzenszwalb & Huttenlocher 2005
Springs
Unary template
![Page 8: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/8.jpg)
Structure Modeling
8
Figure Drawing
β’ Cylinders for each body parts
β’ Join up the cylinders
Pictorial Structures
β’ Unary templates
β’ Pairwise springs
Unable to handle large variations
Mixtures of each part
β’ Unary template for each mixture type
β’ Clustering on (βπ₯,βπ¦)
Yang & Ramanan 2011Image credit:artintegrity.wordpress.com
Fischler & Elschlager 1973
Felzenszwalb & Huttenlocher 2005
Graph Structures
β’ Treesβ’ Latent treesβ’ Loopy graphsβ’ Fully connected
graphs
![Page 9: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/9.jpg)
Deep Learning Based Methods
9
Benefits from better neural network architectures VGG
GoogLeNet,
ResNet
Structures are only used for post processing
Fully Convolutional Networks
Heat map prediction
![Page 10: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/10.jpg)
Gap Between Deep Models and Structure Modeling
10
Gap
![Page 11: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/11.jpg)
End-to-End Learning
Gap Between Deep Models and Structure Modeling
11
![Page 12: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/12.jpg)
CNN
Local appearance is ambiguous
Forward
Backward
Motivation: Geometric Constraints Among Body Parts Helps in Learning Better Representation
12
Forward
Backward
Global consistency helps training
![Page 13: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/13.jpg)
Outline
Introduction
Methodology
Experiments
13
![Page 14: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/14.jpg)
Graph Models
πΊ = (π, πΈ)
Vertices
Locations and mixture types of
body parts
Modeled by a front-end CNN
Edges
Pairwise spatial relationships
between body parts
Modeled by message passing
layers
14
message passing
![Page 15: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/15.jpg)
CNN
Log
ari
thm
So
ftm
ax
Framework
β¦ β¦ β¦
l.shou
neck
r.shou
r.knee
r.ankle
head
β¦
max
max
max
max
max
max
15
Message Passing Layers
![Page 16: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/16.jpg)
CNN
Log
ari
thm
So
ftm
ax
Framework
β¦ β¦ β¦
l.shou
neck
r.shou
r.knee
r.ankle
head
β¦
max
max
max
max
max
max
πΉ π, π πΌ; π½,π =
πβπ
π(ππ , π‘π|πΌ; π)
+
π,π βπΈ
π ππ , ππ , π‘π , π‘π πΌ; ππ,π
π‘π,π‘π
16
Message Passing Layers
![Page 17: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/17.jpg)
CNN
Log
ari
thm
So
ftm
ax
Framework
β¦ β¦ β¦
l.shou
neck
r.shou
r.knee
r.ankle
head
β¦
max
max
max
max
max
max
πΉ π, π πΌ; π½,π =
πβπ
π(ππ , π‘π|πΌ; π)
17
π = ππ = {(π₯π , π¦π)}: location of part π
Message Passing Layers
+
π,π βπΈ
π ππ , ππ , π‘π , π‘π πΌ; ππ,π
π‘π,π‘π
![Page 18: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/18.jpg)
CNN
Log
ari
thm
So
ftm
ax
Framework
β¦ β¦ β¦
l.shou
neck
r.shou
r.knee
r.ankle
head
β¦
max
max
max
max
max
max
πΉ π, π πΌ; π½,π =
πβπ
π(ππ , π‘π|πΌ; π)
18
π = ππ = {(π₯π , π¦π)}: location of part π
π = ππ : mixture type of part π
Message Passing Layers
+
π,π βπΈ
π ππ , ππ , π‘π , π‘π πΌ; ππ,π
π‘π,π‘π
![Page 19: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/19.jpg)
CNN
Log
ari
thm
So
ftm
ax
Front-End CNNPart Appearance Terms
β¦ β¦ β¦
l.shou
neck
r.shou
r.knee
r.ankle
head
β¦
max
max
max
max
max
max
πΉ π, π πΌ; π½,π =
πβπ
π(ππ , π‘π|πΌ; π)
19
+
π,π βπΈ
π ππ , ππ , π‘π , π‘π πΌ; ππ,π
π‘π,π‘π
![Page 20: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/20.jpg)
Message Passing LayersSpatial Relationship Terms
CNN
Log
ari
thm
So
ftm
ax
β¦ β¦ β¦
l.shou
neck
r.shou
r.knee
r.ankle
head
β¦
max
max
max
max
max
max
πΉ π, π πΌ; π½,π =
πβπ
π(ππ , π‘π|πΌ; π)
20
+
π,π βπΈ
π ππ , ππ , π‘π , π‘π πΌ; ππ,π
π‘π,π‘π
![Page 21: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/21.jpg)
conv+
norm+
pool
conv+
norm+
pool
con
v
con
v
con
v
1x1 conv
+dropout
1x1
co
nv
Dro
po
ut
Dro
po
ut
3
3 3 3 3 3
1x1
co
nv
4096
4096
PxK
β¦
CNN
Local confidence of the appearance of part π with mixture type π‘π :
21
Front end CNN
π(ππ , π‘π|πΌ; π) = log π ππ , π‘π πΌ; π = logπ(π ππ , π‘π πΌ; π )
![Page 22: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/22.jpg)
conv+
norm+
pool
conv+
norm+
pool
con
v
con
v
con
v
1x1 conv
+dropout
1x1
co
nv
Dro
po
ut
Dro
po
ut
3
3 3 3 3 3
1x1
co
nv
4096
4096
PxK
β¦
CNN
Local confidence of the appearance of part π with mixture type π‘π :
22
Front end CNN
π(ππ , π‘π|πΌ; π) = log π ππ , π‘π πΌ; π = logπ(π ππ , π‘π πΌ; π )
output of the front-end CNN
![Page 23: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/23.jpg)
conv+
norm+
pool
conv+
norm+
pool
con
v
con
v
con
v
1x1 conv
+dropout
1x1
co
nv
Dro
po
ut
Dro
po
ut
3
3 3 3 3 3
1x1
co
nv
4096
4096
PxK
β¦
CNN
Local confidence of the appearance of part π with mixture type π‘π :
23
Front end CNN
π(ππ , π‘π|πΌ; π) = log π ππ , π‘π πΌ; π = logπ(π ππ , π‘π πΌ; π )
![Page 24: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/24.jpg)
conv+
norm+
pool
conv+
norm+
pool
con
v
con
v
con
v
1x1 conv
+dropout
1x1
co
nv
Dro
po
ut
Dro
po
ut
3
3 3 3 3 3
1x1
co
nv
4096
4096
PxK
β¦
CNN
Local confidence of the appearance of part π with mixture type π‘π :
24
Front end CNN
π(ππ , π‘π|πΌ; π) = log π ππ , π‘π πΌ; π = logπ(π ππ , π‘π πΌ; π )
Softmax function
![Page 25: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/25.jpg)
conv+
norm+
pool
conv+
norm+
pool
con
v
con
v
con
v
1x1 conv
+dropout
1x1
co
nv
Dro
po
ut
Dro
po
ut
3
3 3 3 3 3
1x1
co
nv
4096
4096
PxK
β¦
CNN
Local confidence of the appearance of part π with mixture type π‘π :
25
Front end CNN
π(ππ , π‘π|πΌ; π) = log π ππ , π‘π πΌ; π = logπ(π ππ , π‘π πΌ; π )
Logarithm
![Page 26: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/26.jpg)
CNN
Log
ari
thm
So
ftm
ax
β¦ β¦ β¦
l.shou
neck
r.shou
r.knee
r.ankle
head
β¦
max
max
max
max
max
max
πΉ π, π πΌ; π½,π =
πβπ
π(ππ , π‘π|πΌ; π)
26
+
π,π βπΈ
π ππ , ππ , π‘π , π‘π πΌ; ππ,π
π‘π,π‘π
Message Passing LayersSpatial Relationship Terms
![Page 27: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/27.jpg)
π ππ , ππ, π‘π , π‘π πΌ;ππ,π
π‘π,π‘π = ππ,π
π‘π,π‘π , π(ππ , ππ)
Spatial Relationship Terms
Quadratic deformation constraints
π ππ , ππ = βπ₯, βπ₯2, βπ¦, βπ¦2
βπ₯ = π₯π β π₯π , βπ¦ = π¦π β π¦π
27
Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. CVPR, 2011.
![Page 28: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/28.jpg)
Message Passing: Max-Sum Algorithm
28
πππ ππ , π‘π β πΌπmaxππ,π‘π
π’π ππ , π‘π + π ππ , ππ , π‘π , π‘πThe message passed from part π to π
1
2
π21
![Page 29: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/29.jpg)
Message Passing: Max-Sum Algorithm
29
πππ ππ , π‘π β πΌπmaxππ,π‘π
π’π ππ , π‘π + π ππ , ππ , π‘π , π‘π2
π21
1
π’π ππ , π‘π β πΌπ’ π ππ , π‘π + πβπ(π)
πππ(ππ , π‘π)The belief of part π
The message passed from part π to π
![Page 30: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/30.jpg)
π’π ππ , π‘π β πΌπ’ π ππ , π‘π + πβπ(π)
πππ(ππ , π‘π)
Message Passing: Max-Sum Algorithm
30
πππ ππ , π‘π β πΌπmaxππ,π‘π
π’π ππ , π‘π + π ππ , ππ , π‘π , π‘π
ππβ, π‘π
β = argmax π’πβ ππ , π‘π
ππ,π‘π
![Page 31: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/31.jpg)
Diagnosis on Message Passing Layers
31
1st
![Page 32: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/32.jpg)
Diagnosis on Message Passing Layers
32
2nd
![Page 33: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/33.jpg)
Diagnosis on Message Passing Layers
33
3rd
![Page 34: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/34.jpg)
Outline
Introduction
Methodology
Experiments
34
![Page 35: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/35.jpg)
Datasets
LSP
35
FLIC
Sports1000 training1000 testing
Films3987 training1016 testing
![Page 36: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/36.jpg)
36
Qualitative Results on the LSP Dataset
![Page 37: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/37.jpg)
37
![Page 38: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/38.jpg)
38
PCP on the LSP dataset[Percentage of Correct Parts]
![Page 39: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/39.jpg)
39
30
40
50
60
70
80
90
100
TORSO HEAD UPPER ARMS LOWER ARMS UPPER LEGS LOWER LEGS MEAN
STRICT PCP ON THE LSP DATASET
Yang&Ramanan, CVPR'11 Pishchulin et al., CVPR'13
Eichner&Ferrari, ACCV'13 Kiefel&Gehler, ECCV'14
Pose Machines, ECCV'14 Ouyang et al., CVPR'14
Pishchulin et al., ICCV'13 DeepPose, CVPR'14
Chen&Yuille, NIPS'14 Ours
![Page 40: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/40.jpg)
Percentage of Detected Joints (PDJ)
40LSP FLIC
Our method
![Page 41: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/41.jpg)
Sports1000 training1000 testing
Films3987 training1016 testing
Datasets
LSP
41
Image ParseFLIC
Activities100 training205 testing
Cross-dataset validation
![Page 42: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/42.jpg)
42
GeneralizationImage Parse Dataset
30 50 70 90
TORSO
HEAD
UPPER ARMS
LOWER ARMS
UPPER LEGS
LOWER LEGS
MEAN
Ours
Ouyang et al., CVPR'14
Yang&Ramanan, TPAMI'13
Pishchulin et al., ICCV'13
Pishchulin et al., CVPR'13
Pishchulin et al., CVPR'12
Johnson&Everingham, CVPR'11
Yang&Ramanan, CVPR'11
![Page 43: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/43.jpg)
Component Analysis
43
![Page 44: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/44.jpg)
Unary Term vs. Full Model83.4
69
53.5
34.9
72.2
63.5
60.1
93
82.1
70.6
55.4
82.1
75.3
74.2
96.5
83.1
78.8
66.7
88.7
81.7
81.1
30
40
50
60
70
80
90
100
TORSO HEAD UPPER ARMS LOWER ARMS UPPER LEGS LOWER LEGS MEAN
STRICT PCP ON THE LSP DATASET (VGG-LG)
Unary term only
Full model (seperately trained)
Full model (jointly trained)
44
![Page 45: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/45.jpg)
Tree-Structured Model vs. Loopy Model
45
96.2
83.4
78.7
65.8
87.9
81.1
80.7
96.5
83.1
78.8
66.7
88.7
81.7
81.1
62 72 82 92
Torso
Head
Upper Arms
Lower Arms
Upper Legs
Lower Legs
Mean
Loopy Model Tree Model
![Page 46: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/46.jpg)
Human Pose Estimation in This CVPR
46
Chu et al. Structured Feature Learning for Pose Estimation.
Wei et al. Convolutional Pose Machines.
Carreira et al. Human Pose Estimation With Iterative Error Feedback.
![Page 47: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/47.jpg)
CNNStructure
Summary
End-to-End Learning
Bridge the gap between structure modeling and deep learning
A new message passing layer, which is flexible to tree-structured/loopy relational graphs.
![Page 48: Deep Learning for Articulated Human Pose Estimation](https://reader033.fdocuments.us/reader033/viewer/2022052208/58a30c3e1a28ab24348bc4c8/html5/thumbnails/48.jpg)
Scan for more details
CNNStructure
Thank you.Welcome to our poster session #5