Real-Time Human Pose Recognition in Parts from Single...
Transcript of Real-Time Human Pose Recognition in Parts from Single...
![Page 1: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/1.jpg)
Real-Time Human Pose Recognition
in Parts from Single Depth Images
Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard
Moore, Alex Kipman, Andrew Blake
CVPR 2011
PRESENTER: AHSAN ABDULLAH
![Page 2: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/2.jpg)
PROBLEM
![Page 3: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/3.jpg)
right
elbow
right hand left
shoulderneck
APPROACH
• Partitioning into body parts helps localizing the joints
Shotton et. al. CVPR 2011
![Page 4: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/4.jpg)
infer
body parts
per pixelcluster pixels to
hypothesize
body joint
positions
capture
depth image &
remove bg
fit model &
track skeleton
PIPELINE
Shotton et. al. CVPR 2011
Design Goals
• Efficiency
• Robustness
![Page 5: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/5.jpg)
Compute P(ci|wi)
pixels i = (x, y)
body part ci
image window wi
Discriminative approach
learn classifier P(ci|wi) from training data
image windows move
with classifier
BODY PART CLASSIFICATION
Shotton et. al. CVPR 2011
![Page 6: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/6.jpg)
LEARNING DATA
synthetic(train & test)
real(test) Shotton et. al. CVPR 2011
![Page 7: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/7.jpg)
LEARNING – DATA SYNTHESIS
Record MoCap500k frames
distilled to 100k poses
Retarget to several models
Render (depth, body parts) pairs
Shotton et. al. CVPR 2011
![Page 8: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/8.jpg)
• Depth comparisons
- very fast to compute
input
depth
image
xΔ
xΔ
xΔx
Δ
x
Δ
x
Δ
𝑓 𝐼, x = 𝑑𝐼 x − 𝑑𝐼(x + Δ)
image depth
image coordinate
offset depth
feature
response
Background pixelsd = large constant
scales inversely with depth
Δ =𝐯
𝑑𝐼 x
FEATURE SET
Shotton et. al. CVPR 2011
![Page 9: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/9.jpg)
Aggregation of decision trees
DECISION FORESTS
![Page 10: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/10.jpg)
Qn = (I, x)
f(I, x; Δn) > θn
no yes
c
Pr(c)
body part c
Pn(c)
c
Pl(c)
Take (Δ, θ) that maximises information gain
n
l r
reduce
entropy
[Breiman et al. 84]
for all pixels
Shotton et. al. CVPR 2011
TRAINING DECISION TREES
![Page 11: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/11.jpg)
image windowcentred at x
no
Toy example:Distinguish left (L)
and right (R) sides of
the body
no yes
yes
L R
P(c)
L R
P(c)
L R
P(c)
f(I, x; Δ1) > θ1
f(I, x; Δ2) > θ2
Shotton et. al. CVPR 2011
DECISION TREE CLASSIFICATION
![Page 12: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/12.jpg)
Trained on different random subset of images
“bagging” helps avoid over-fitting
Average tree posteriors
[Amit & Geman 97]
[Breiman 01]
[Geurts et al. 06]
………tree 1 tree T
c
P1(c)c
PT(c)
(𝐼, x) (𝐼, x)
𝑃 𝑐 𝐼, x =1
𝑇
𝑡=1
𝑇
𝑃𝑡(𝑐|𝐼, x)
Shotton et. al. CVPR 2011
DECISION FOREST CLASSIFIER
![Page 13: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/13.jpg)
ground truth
1 tree 3 trees 6 trees
inferred body parts (most likely)
40%
45%
50%
55%
1 2 3 4 5 6
Av
era
ge
pe
r-c
lass
…
Number of trees
Shotton et. al. CVPR 2011
NUMBER OF TREES
![Page 14: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/14.jpg)
30%
35%
40%
45%
50%
55%
60%
65%
8 12 16 20
Av
era
ge
pe
r-c
lass
ac
cu
rac
y
Depth of trees
30%
35%
40%
45%
50%
55%
60%
65%
5 15Depth of trees
synthetic test data real test data
Shotton et. al. CVPR 2011
TREE DEPTH
![Page 15: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/15.jpg)
• Define 3D world space density
• Mean shift for mode detection
Body parts to joint hypotheses
3. hypothesize
body joints
…
1 2
pixel index ibandwidth
3D coord
of i th pixel3D coord
pixel
weight
inferred
probability
depth at
i th pixel
Shotton et. al. CVPR 2011
![Page 16: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/16.jpg)
front view top viewside view
input depth inferred body parts
inferred joint positions
Shotton et. al. CVPR 2011No tracking or smoothing
![Page 17: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/17.jpg)
front view top viewside view
input depth inferred body parts
inferred joint positions
Shotton et. al. CVPR 2011No tracking or smoothing
![Page 18: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/18.jpg)
0.00.10.20.30.40.50.60.70.80.91.0
Ce
nte
r H
ea
d
Ce
nte
r N
ec
k
Left
Sh
ou
lde
r
Rig
ht…
Left
Elb
ow
Rig
ht
Elb
ow
Left
Wrist
Rig
ht
Wrist
Left
Ha
nd
Rig
ht
Ha
nd
Left
Kn
ee
Rig
ht
Kn
ee
Left
An
kle
Rig
ht
An
kle
Left
Fo
ot
Rig
ht
Fo
ot
Me
an
AP
Av
era
ge
pre
cis
ion
Shotton et. al. CVPR 2011
JOINT PREDICTION ACCURACY
![Page 19: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/19.jpg)
0.00.10.20.30.40.50.60.70.80.91.0
Cen
ter
Hea
d
Cen
ter
Nec
k
Lef
t S
ho
uld
er
Rig
ht
Sh
ou
lder
Lef
t E
lbo
w
Rig
ht
Elb
ow
Lef
t W
rist
Rig
ht
Wri
st
Lef
t H
and
Rig
ht
Han
d
Lef
t K
nee
Rig
ht
Kn
ee
Lef
t A
nkl
e
Rig
ht
An
kle
Lef
t F
oo
t
Rig
ht
Fo
ot
Mea
n A
P
Ave
rag
e p
reci
sio
n
Joint prediction from ground truth body parts
Joint prediction from inferred body parts
Shotton et. al. CVPR 2011
JOINT PREDICTION ACCURACY
![Page 20: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/20.jpg)
• No temporal information
- frame-by-frame
• Very fast
- simple depth image feature
- parallel decision forest classifier
Shotton et. al. CVPR 2011
ANALYSIS
![Page 21: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/21.jpg)
Uses…
• 3D joint hypotheses
• kinematic constraints
• temporal coherence
… to give
• full skeleton
• higher accuracy
• invisible joints
• multi-player4. track skeleton
1
2
3
KINECT SYSTEM
![Page 22: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/22.jpg)
• Frame-by-frame gives robustness
• Body parts representation for efficiency
• Fast, simple machine learning
• Significant engineering to scale to a
massive, varied training data set
Shotton et. al. CVPR 2011
SUMMARY
![Page 23: Real-Time Human Pose Recognition in Parts from Single ...yjlee/teaching/ecs289h-fall2014/kinect.pdfReal-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton,](https://reader033.fdocuments.us/reader033/viewer/2022050104/5f42ce4319339b4ff1069661/html5/thumbnails/23.jpg)
QUESTIONS