Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ......
-
Upload
truonglien -
Category
Documents
-
view
216 -
download
0
Transcript of Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ......
![Page 1: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/1.jpg)
Andrew ZissermanDepartment of Engineering Science
University of Oxford, UKhttp://www.robots.ox.ac.uk/~vgg/
Human Pose Estimation in images and videos
![Page 2: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/2.jpg)
Objective and motivation
Determine human body pose (layout)
Why? To recognize poses, gestures, actions
![Page 3: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/3.jpg)
Activities characterized by a pose
![Page 4: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/4.jpg)
Activities characterized by a pose
![Page 5: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/5.jpg)
Activities characterized by a pose
![Page 6: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/6.jpg)
Challenges: articulations and deformations
![Page 7: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/7.jpg)
Challenges: of (almost) unconstrained images
varying illumination and low contrast; moving camera and background;multiple people; scale changes; extensive clutter; any clothing
![Page 8: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/8.jpg)
![Page 9: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/9.jpg)
Outline
• Review of pictorial structures for articulated models
• Inference given the model: Strong supervision, full generative model – “Gold-standard model”
• Image parsing: learning the model for a specific image
• Recent advances
• Datasets and challenges
![Page 10: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/10.jpg)
Pictorial Structures
• Intuitive model of an object
• Model has two components
1. parts (2D image fragments)
2. structure (configuration of parts)
• Dates back to Fischler & Elschlager 1973
![Page 11: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/11.jpg)
Localize multi-part objects at arbitrary locations in an image
• Generic object models such as person or car• Allow for articulated objects• Simultaneous use of appearance and spatial information• Provide efficient and practical algorithms
To fit model to image: minimize an energy (or cost) function that reflects both• Appearance: how well each part matches at given location• Configuration: degree to which parts match 2D spatial layout
![Page 12: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/12.jpg)
Long tradition of using pictorial structures for humans
Learning to Parse Pictures of People Ronfard, Schmid & Triggs, ECCV 2002
Pictorial Structure Models for Object RecognitionFelzenszwalb & Huttenlocher, 2000
Finding People by Sampling Ioffe & Forsyth, ICCV 1999
![Page 13: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/13.jpg)
Felzenszwalb & Huttenlocher
NB: requires background subtraction
![Page 14: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/14.jpg)
Variety of Poses
![Page 15: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/15.jpg)
Variety of Poses
![Page 16: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/16.jpg)
Objective: detect human and determine upper body pose (layout)
• Vertices V are parts, ai, i = 1, · · · , n
• Edges E are pairwise linkages between parts
• For each part there are h possible poses pj = (xj, yj,φj, sj)
• Label each part by its pose: f : V −→ {1, · · · , h}, i.e. part a takes pose pf(a).
a1
a2
si f
Model as a graph labelling problem
![Page 17: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/17.jpg)
Pictorial structure model – CRF
• Each labelling has an energy (cost):
E(f) =Xa∈V
θa;f(a) +X
(a,b)∈Eθab;f(a)f(b)
• Fit model (inference) as labelling with lowest energy
a1
a2
si f
unary terms (appearance)
pairwise terms (configuration)
Features for unary: • colour• HOG for limbs/torso
![Page 18: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/18.jpg)
Unary term: appearance feature I - colour
input image skin torso background
colour posteriors
![Page 19: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/19.jpg)
Histogram of oriented gradients (HOG)
HOG of imageHOG of lowerarm template
(learned)L2 Distance
Unary term: appearance feature II - HOGDalal & Triggs, CVPR 2005
![Page 20: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/20.jpg)
θab;ij = wabd(|i-j|)
i - j
d
Potts
i - j
d
Truncated Quadratic
Pairwise terms: kinematic layout
![Page 21: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/21.jpg)
Pictorial structure model – CRF
• Each labelling has an energy (cost):
E(f) =Xa∈V
θa;f(a) +X
(a,b)∈Eθab;f(a)f(b)
• Fit model (inference) as labelling with lowest energy
a1
a2
si f
unary terms (appearance)
pairwise terms (configuration)
Features for unary: • colour• HOG for limbs/torso
![Page 22: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/22.jpg)
Complexity
a1
a2
si f
• n parts
• For each part there are h possible poses pj = (xj, yj,φj, sj)
• There are hn possible labellings
Problem: any reasonable discretization (e.g. 12 scales and 36 angles for upper and lower arm, etc) gives a number of configurations 10^12 – 10^14
Brute force search not feasible
![Page 23: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/23.jpg)
Are trees the answer?
He T
UAUA
LA
Ha
left
Ha
LA
right
• With n parts and h possible discrete locations per part, O(hn)
• For a tree, using dynamic programming this reduces to O(nh2)
• If model is a tree and has certain edge costs, then complexity reduces to O(nh) using a distance transform [Felzenszwalb & Huttenlocher, 2000, 2005]
![Page 24: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/24.jpg)
Problems with tree structured pictorial structures
• Layout model defines the foreground, i.e. it chooses the pixels to “explain”
• ignores skin and strong edge in background
• “double counting”
Generative model of foreground only
![Page 25: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/25.jpg)
Kinematic structure vs graphical (independence) structure
Graph G = (V,E)
He T
UAUA
LA
Ha
left
Ha
LA
right
He T
UAUA
LA
Ha
left
Ha
LA
right
Requires more connections than a tree
![Page 26: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/26.jpg)
And for the background problem
1. Add background model so that every pixel in region explained
2. f lays out parts in back-to-front depth order (painter’s algorithm)
Colour is pixel-wise labellingby parts (back-to-front)
Efull = E(f) +X
pixels xi not in f
E(xi|bgcol)
Generative model of entire region
![Page 27: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/27.jpg)
Outline
• Review of pictorial structures for articulated models
• Inference given the model: Strong supervision, full generative model – “Gold-standard model”
• Image parsing: learning the model for a specific image
• Recent advances
• Datasets and challenges
![Page 28: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/28.jpg)
Patrick Buehler, Mark Everingham,
Daniel Huttenlocher, Andrew Zisserman
British Machine Vision Conference 2008
Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts
![Page 29: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/29.jpg)
Objective
• Detect hands and arms of person signing British Sign Language
• Hour long sequences
• Strong but minimal supervision
![Page 30: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/30.jpg)
Learning the model
Strong supervision: manual input
40 annotated frames per video, used for pose estimation in > 50,000 frames
5 frames 40 frames 15 frames 15 frames
Learn colourmodel
Learn HOGtemplates
Provide head and body examples
![Page 31: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/31.jpg)
Inference (model fitting)• Fit head and torso [Navaratnam et al. 2005]• Then: arms and hands
Find arm/hand pose with minimum cost
Input Output
Head and torso fitting
Intermediate step
Problem: Brute force search is still not feasible
![Page 32: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/32.jpg)
Model fitting by sampling• Sample configurations from inexpensive model• Evaluate configuration using full model
InputOutput
For sampling use tree structured pictorial Structures: • [Felzenszwalb & Huttenlocher 2000, 2005] • Complexity linear in the number of parts O(nh)• Pr(f | data): Sample from max-marginal with heuristics 1000 times• cf Felzenszwalb & Huttenlocher 2005 sampled from marginal
samples
best arm candidate
![Page 33: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/33.jpg)
Model fitting by sampling
• Sample configurations from inexpensive tree structured model• Evaluate configuration using full model
![Page 34: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/34.jpg)
Example results
![Page 35: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/35.jpg)
Pose estimation results
![Page 36: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/36.jpg)
Patrick Buehler
Mark Everingham
Andrew Zisserman
Learning sign language by watching TV (using weakly aligned subtitles)
Application
CVPR 2009
![Page 37: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/37.jpg)
Use subtitles to find video sequences containing word. These are the positivetraining sequences. Use other sequences as negative training sequences.
Objective
Learn signs in British Sign Language (BSL) corresponding to text words:• Training data from TV broadcasts with simultaneous signing • Supervision solely from sub-titles
Input: video + subtitle
Output: automaticallylearned signs (4x slow motion)
Office
Government
![Page 38: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/38.jpg)
Overview
Given an English word e.g. “tree” what is the corresponding British Sign Language sign?
positivesequences
negativeset
![Page 39: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/39.jpg)
positivesequences
negativeset
1st sliding window
Use sliding window to choose sub-sequence of poses in one positive sequence and determine ifsame sub-sequence of poses occurs in other positive sequences somewhere, but does not occur in the negative set
![Page 40: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/40.jpg)
positivesequences
negativeset
5th sliding window
Use sliding window to choose sub-sequence of poses in one positive sequence and determine ifsame sub-sequence of poses occurs in other positive sequences somewhere, but does not occur in the negative set
![Page 41: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/41.jpg)
Positivebags
Negativebag
sign ofinterest
Multiple instance learning
![Page 42: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/42.jpg)
Good results for a variety of signs:
Evaluation
Signs where hand movement
is important
Signs where hand shape is important
Signs where both handsare together
Signs whichare finger--
spelled
Signs whichare perfomed in front of the face
Navy
Prince
Lung
Garden
Fungi
Golf
Kew
Bob
Whale
Rose
![Page 43: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/43.jpg)
Summary
Given a good appearance model and proper account of foreground and background, then problems such as occlusion and ordering can be resolved. The cost of inference still remains though.
Next:
- How to obtain models automatically in videos and images- If the appearance features are discriminative, how far can one go with foreground only pictorial structures and tree based inference?
![Page 44: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/44.jpg)
Outline
• Review of pictorial structures for articulated models
• Inference given the model: Strong supervision, full generative model – “Gold-standard model”
• Image parsing: learning the model for a specific image
• Recent advances
• Datasets and challenges
![Page 45: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/45.jpg)
Strike a Pose: Tracking People by Finding Stylized PosesDeva Ramanan, David Forsyth and Andrew Zisserman, CVPR 2005
Learning appearance models in videos
![Page 46: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/46.jpg)
walkingpose
pictorialstructure
edges
efficientmatching
![Page 47: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/47.jpg)
Build Model
torsobg
find discriminative
featureslearnlimb
classifiers
unusual pose
small scale
(limb pixels aloneare poor model)
![Page 48: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/48.jpg)
Build Model & Detect
torso
arm
leg
head
labelpixels
learnlimb
classifiersgeneral
posepictorialstructure
unusual pose
small scale
![Page 49: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/49.jpg)
Running Example
![Page 50: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/50.jpg)
How well do classifiers generalize?
![Page 51: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/51.jpg)
Image Parsing – Ramanan NIPS 06
57
Learn image and person specific unary terms• initial iteration edges• following iterations edges & colour
E(f) =Pa∈V θa;f(a) +
P(a,b)∈E θab;f(a)f(b)
unary terms (edges/colour)
pairwise terms (configuration)
![Page 52: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/52.jpg)
![Page 53: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/53.jpg)
(Almost) unconstrained images
Extremely difficult when knowing nothing about appearance/pose/location
![Page 54: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/54.jpg)
Failure of direct pose estimation
Ramanan NIPS 2006 unaided
Not powerful enough for a cluttered image where size is not given
![Page 55: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/55.jpg)
Progressive search space reductionfor human pose estimation
Vitto Ferrari, Manuel Marin-Jimenez, Andrew Zisserman
CVPR 2008/2009
![Page 56: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/56.jpg)
Find (x,y,s) coordinate frame for a person
detection window (upper-body, face etc.)
Ferrari et al. 08, Andriluka et al. 09, Gammeter et al. 08
DET
ECTO
R
62
Restrict search space using detector
![Page 57: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/57.jpg)
Supervision• None
Weaker model• Tree structured graphical model • Overlap not modelled• Single scale parameter• No background model
Inference• Detect person – use upper body detector• Use upper body region to restrict search• Use colour segmentation to restrict search further• Parsing pictorial structure by Ramanan NIPS 06
Learn an image and person specific model
![Page 58: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/58.jpg)
Search space reduction by upper body human detection
+ fixes scale of body parts
+ sets bounds on x,y locations
- little info about pose (arms)
Ideaget approximate location and scale with adetector generic over pose and appearance
Building an upper-body detector- based on Dalal and Triggs CVPR 2005
- train = 96 frames X 12 perturbations
+ fast
Benefits for pose estimation
detected enlarged
+ detects also back views
Train
Test
(1) detect human; (2) reduce search from hn
![Page 59: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/59.jpg)
Upper body detector – using HOGs
average training data
![Page 60: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/60.jpg)
Search space reduction by foreground highlighting
+ further reduce clutter
+ conservative (no loss 95.5% times)
Ideaexploit knowledge about structure of search area to initialize Grabcut
Initialization- learn fg/bg models from regions whereperson likely present/absent
- clamp central strip to fg
- don’t clamp bg (arms can be anywhere)
Benefits for pose estimation
+ needs no knowledge of background
+ allows for moving backgroundinitialization output
![Page 61: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/61.jpg)
+ further reduce clutter
+ conservative (no loss 95.5% times)
Ideaexploit knowledge about structure of search area to initialize Grabcut
Initialization- learn fg/bg models from regions whereperson likely present/absent
- clamp central strip to fg
- don’t clamp bg (arms can be anywhere)
Benefits for pose estimation
+ needs no knowledge of background
+ allows for moving background
Search space reduction by foreground highlighting
![Page 62: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/62.jpg)
Pose estimation by image parsing - Ramanan NIPS 06
edgeparse
edge + colparse + much more robust
+ much faster (10x-100x)
Goalestimate posterior of part configuration
Algorithm1. inference with edges unary
2. learn appearance models ofbody parts and background
3. inference with edges + colour unary
Advantages of space reduction
E(f) =Pa∈V θa;f(a) +
P(a,b)∈E θab;f(a)f(b)
unary terms (edges/colour)
pairwise terms (configuration)
appearance
![Page 63: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/63.jpg)
Failure of direct pose estimationRamanan NIPS 2006 unaided
![Page 64: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/64.jpg)
Results on Buffy frames
![Page 65: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/65.jpg)
Results on PASCAL flickr images
![Page 66: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/66.jpg)
What is missed?
![Page 67: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/67.jpg)
What is missed?
truncation is not modelled
![Page 68: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/68.jpg)
What is missed?
occlusion is not modelled
![Page 69: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/69.jpg)
Given user-selectedquery frame+person …
… retrieve shots with personsin the same pose from video database
video database
query
Application: Pose Search
CVPR 2009
![Page 70: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/70.jpg)
Pose descriptors
- soft-segmentations of body parts
- distributions over orient+locationfor parts and pairs of parts
Similarity measures- dot-product (= soft intersection)
- Batthacharrya / Chi-square
Pose Search
![Page 71: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/71.jpg)
Processing
Off-line:• Detect upper bodies in every frame • Link (track) upper body detections• Estimate upper body pose for each frame of track• Compute descriptor (vector) for each upper body pose
Run-time:• Rank each track by its similarity to the query pose
![Page 72: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/72.jpg)
Q
Pose Search
“hips pose”
![Page 73: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/73.jpg)
Pose Search
Q
“rest pose”
![Page 74: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/74.jpg)
Pose Search
Q
“rest pose”
![Page 75: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/75.jpg)
Other poses – query interesting pose
Hollywood movies – Query on Gandhi, Search Hugh Grant opus
Q
![Page 76: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/76.jpg)
![Page 77: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/77.jpg)
![Page 78: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/78.jpg)
![Page 79: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/79.jpg)
Other poses – query interesting pose
Hollywood movies – Query on Gandhi, Search Hugh Grant opus
Q
![Page 80: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/80.jpg)
![Page 81: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/81.jpg)
![Page 82: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/82.jpg)
![Page 83: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/83.jpg)
![Page 84: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/84.jpg)
Outline
• Review of pictorial structures for articulated models
• Inference given the model: Strong supervision, full generative model – “Gold-standard model”
• Image parsing: learning the model for a specific image
• Recent advances
• Datasets and challenges
![Page 85: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/85.jpg)
Better appearance models for pictorial structures
Marcin Eichner, Vittorio FerrariBMVC 2009
96
![Page 86: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/86.jpg)
relative location (wrt detection window): • stable, e.g. head, torso• unstable, e.g. upper/lower arms
97
![Page 87: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/87.jpg)
Appearance of different body parts is related
short sleeves no sleeveslong sleeves
Use stable parts to improve the prediction of the unstable ones
98
![Page 88: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/88.jpg)
learnt location priors (PASCAL & Buffy 3,4)
torso upper arms
lower arms head
LP encodes:• variability of poses• detection window
inaccuracy
99
![Page 89: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/89.jpg)
input
detection window
LP
estimate initial AM
output
Pictorial Structures inference
100
coordinate frame
TM
apply appearance transfer
compute unary term Φ:
![Page 90: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/90.jpg)
Efficient Discriminative Learning of Parts-based Models
Pawan Kumar, Phil Torr, Andrew Zisserman
ICCV 2009
![Page 91: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/91.jpg)
Learning a discriminative model
Supervision• bounding rectangles for limbs for positive examples
Weak model• Tree structured graphical model• Parts labelled as occluded or not• Scale of parts known
Discriminative learning• Similar to Max-margin Markov network• But much more efficient inference
![Page 92: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/92.jpg)
Problem formulationEnergy of a labelling:
E(f) =Xa∈V
θa;f(a) +X
(a,b)∈Eθab;f(a)f(b) + b= w>θf + b
Assume the following form
• Unary term: θa;f(a) = w>a θa;f(a),
• Pairwise term: θab;f(a)f(b) = w>abθab;f(a)f(b),
where
• θa;f(a) is the feature vector for part a (HOG + colour)
• θab;f(a)f(b) are the pairwise features (spatial configuration)
• We want to learn the parameters w = (wa;wab) and b
![Page 93: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/93.jpg)
Training data
Positive examples
Negative examples
• all other configurations
![Page 94: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/94.jpg)
Wide margin formulation for learning
(w∗, b∗) = argminw,b12||w||2 + C(
Pk ξk+
Pl ξl),
s.t. w>θk++ b ≥ 1− ξk,∀k (positive examples),
w>θl−+ b ≤ −1+ ξl,∀l (negative examples),
ξk ≥ 0,∀k, ξl ≥ 0,∀l.Convex formulation. Similar to:
• Tsochantaridis, Hofmann, Joachims, & Altun. Support vector learning for
interdependent and structured output spaces.ICML, 2004.
• (supervised version of) Felzenszwalb, McAllester, & Ramanan. A discriminatively trained, multiscale, deformable part model. CVPR, 2008
![Page 95: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/95.jpg)
Results
![Page 96: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/96.jpg)
H3D: Humans in 3D
Lubomir Bourdev & Jitendra MalikICCV 2009
![Page 97: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/97.jpg)
![Page 98: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/98.jpg)
![Page 99: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/99.jpg)
![Page 100: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/100.jpg)
![Page 101: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/101.jpg)
![Page 102: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/102.jpg)
![Page 103: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/103.jpg)
AP =0.394
![Page 104: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/104.jpg)
![Page 105: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/105.jpg)
• Human Pose Estimation Using Consistent Max-Covering, HaoJiang, ICCV 09
• Max-margin hidden conditional random fields for human action recognition, Yang Wang and Greg Mori, CVPR 09
• Adaptive pose priors for pictorial structures, B. Sapp, C. Jordan, and B. Taskar, CVPR 10
Further ideas:
![Page 106: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/106.jpg)
Outline
• Review of pictorial structures for articulated models
• Inference given the model: Strong supervision, full generative model – “Gold-standard model”
• Image parsing: learning the model for a specific image
• Recent advances
• Datasets and challenges
![Page 107: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/107.jpg)
Datasets & EvaluationSome efforts evaluating person image parsing
Oxford Buffy Stickmen276 frames x 6 = 1656 body parts (sticks)
PASCAL VOC “Person Layout”
Berkeley H3DETHZ Pascal stickmen set
549 images x6 = 3294 body parts (sticks)
![Page 108: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/108.jpg)
The PASCAL Visual Object Classes Challenge 2010 (VOC2010)
Mark Everingham, Luc Van GoolChris Williams, John Winn
Andrew Zisserman
![Page 109: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/109.jpg)
Person Layout TasterGiven the bounding box of a person, predict the visibility and positions of head, hands and feet.
• About 600 training examples• But can also use any training data (not overlapping with
test set)
![Page 110: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/110.jpg)
Human Action Classes TasterGiven the bounding box of a person, determine which, if any, of 9 action classes apply
• suggested by Ivan Laptev• choice of classes governed by availability from flickr• evaluation is by AP on each class• 50-90 training images for each class
phoningplaying
instrument
reading
working on computer
![Page 111: Human Pose Estimation in images and · PDF fileHuman Pose Estimation in images and videos. ... Use subtitles to find video sequences containing word. ... • Use colour segmentation](https://reader031.fdocuments.us/reader031/viewer/2022022502/5aaeb0647f8b9a07498c71a8/html5/thumbnails/111.jpg)
Human Action Classes Taster continued
riding horse riding bike
taking photorunning
walking