2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

download 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

of 15

Transcript of 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    1/15

    Monocular 3D Pose Estimation

    and Tracking by Detection

    Mykhaylo Andriluka

    Stefan RothBernt Schiele

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    2/15

    3D Pose Estimation

    Estimate positions and angles of individual bodyparts in a 3D space

    Monocular refers to a single camera system

    Very reliable in controlled situations used inmotion tracking

    Currently poor performance in realistic scenes

    Frequently relies on edge detection/background

    subtraction Potential problems: loose clothing, occlusions,

    ego motion, background clutter

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    3/15

    Why is it interesting for us?

    Accurate body pose estimation makes actionrecognition practically trivial

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    4/15

    This paper

    Performs 3D pose estimation of multiple

    people simultaneously with a single camera in

    a realistic street scene

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    5/15

    Pictorial Structures Model

    2D part-based model each part i

    represented by lmi = {xmi,ymi,mi,smi} at frame m

    Lm - Overall part configuration at frame m

    Dm - Visual evidence at frame m

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    6/15

    Pictorial Structures Model

    Body represented as left/right lower and

    upper legs, torso, head and left/right upper

    and lower arms

    Each body part detected individually by parts

    detectors

    The posterior probability Lm is maximized to

    detect the body

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    7/15

    Viewpoint Estimation

    This method only detects people, and only

    from a single viewpoint

    This paper trains 10 of these detectors from a

    multiview dataset each detector assumes a

    different viewpoint

    This gives us viewpoint estimation find the

    detector with the strongest response to the

    scene

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    8/15

    Tracklet Extraction

    Want to extract tracks of each person relatingtemporal states can give us more information for bodypose estimation even gives more robustness againstocclusion

    Use pictorial structure model as detector, to getbounding boxes, and likely viewpoint at each frame, foreach person

    Treat bounding boxes and viewpoint probabilities asemissions, hypotheses as states, in a Hidden MarkovModel

    Use Viterbi Decoding to extract most likely sequence ofstates/viewpoints.

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    9/15

    Tracklet Extraction

    Transition Probabilities between states:

    For viewpoints, high transition probabilities

    between similar viewpoints, to reflect that people

    turn slowly

    For bounding boxes, transition probability is

    proportional to difference between RGB colour

    histograms within each bounding box

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    10/15

    3D Pose Estimation

    Use 2D->3D examplars to pick most likely 3D

    pose in tracklet, for each frame. This gives us

    M body pose hypotheses for each frame,

    where M is the length of the tracklet

    3D body pose at frame m: Qm = {qm, m, hm}

    q joint configuration

    body rotation in 3D world

    h position and scale of the body

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    11/15

    Representation of Pose

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    12/15

    3D Pose Estimation

    Single Frame likelihood:

    Breakdown:

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    13/15

    Position of Body Parts

    In order to reduce computational complexity

    of 3D body part estimation, we find theJ most

    likely locations for each body part n in frame

    m, and then calculate the Gaussiandistribution of that body part.

    This allows the posterior probability to be

    modelled as:

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    14/15

    hGLPVM

    Given the above information, with a prior

    estimation of p(Q1:m)= p(q1:m)p(h1:m), we can

    estimate the posterior probability of the

    frames using hGLPVM

    This models the sequence of poses as a

    Gaussian process, and solves using MAP

    estimation

  • 8/3/2019 2011-6-3Monocular 3D Pose Estimation and Tracking by Detection

    15/15

    Pose Estimation Examples