Nonparametric Object and Parts Modeling With Lie Group...
Transcript of Nonparametric Object and Parts Modeling With Lie Group...
Nonparametric Object and Parts Modeling with Lie Group Dynamics
David S. Hayden, Jason Pacheco, John W. Fisher III
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology
77 Massachusetts Ave., Cambridge, MA 02139
{dshayden, pachecoj, fisher}@csail.mit.edu
Abstract
Articulated motion analysis often utilizes strong prior
knowledge such as a known or trained parts model for
humans. Yet, the world contains a variety of articulating
objects–mammals, insects, mechanized structures–where
the number and configuration of parts for a particular ob-
ject is unknown in advance. Here, we relax such strong
assumptions via an unsupervised, Bayesian nonparametric
parts model that infers an unknown number of parts with
motions coupled by a body dynamic and parameterized by
SE(D), the Lie group of rigid transformations. We derive
an inference procedure that utilizes short observation se-
quences (image, depth, point cloud or mesh) of an object in
motion without need for markers or learned body models.
Efficient Gibbs decompositions for inference over distribu-
tions on SE(D) demonstrate robust part decompositions of
moving objects under both 3D and 2D observation models.
The inferred representation permits novel analysis, such as
object segmentation by relative part motion, and transfers
to new observations of the same object type.
1. Introduction
The world is full of moving objects comprised of articu-
lating parts. Despite the wide range and complexity of such
objects, humans have a remarkable ability to accurately dis-
cern both the number of articulating parts and their relation
to the whole with few observations. We are interested in de-
veloping reasoning methods and algorithms that mimic this
ability. While one might consider supervised methods that
rely on large amounts of labeled training data about every
conceivable object, articulated motion and view, the task of
data collection seems daunting and unnecessary.
Consequently, we develop a generative model that infers
an object decomposition solely from brief observations of
the object in motion. Specifically, we propose a parts-based
representation that leverages Bayesian nonparametric dy-
namical models while eschewing strong assumptions about
Figure 1. The number, rotation, translation, and shape of an ob-
ject’s parts are learned from a small number of observations of that
object in motion. Motion of the body and parts is parameterized
by the Lie group of rigid transformations in 3D or 2D. Supported
data sources include sequences of meshes / point clouds (A, hu-
man), depth data (B, marmoset), and 2D images (C, hand, spider).
the number or configuration of parts. The model simultane-
ously infers a dynamic body frame, the number of parts, and
their motion relative to the body frame. Diverse inputs are
supported–2D image sequences, 2.5D depth sequences and
3D point cloud or mesh sequences–without need for restric-
tive assumptions about target appearance or the existence of
specially-placed markers or sensors.
Objects and parts are assumed to rotate and translate
smoothly in space, leading to a natural parameterization in
SE(D), the Lie group of rigid transformations. By repre-
senting statistics of motion in the Lie algebra se(D), we
derive closed-form Gibbs updates on translation dynamics,
and an efficient sampler for rotation dynamics.
Contributions. We specify a novel Bayesian nonpara-
metric model that is well-suited to the properties of articu-
lated objects in motion (part persistence, rigid transforma-
tion dynamics, unknown number of parts). We demonstrate
a novel decomposition of inferring translations and rota-
tions in posterior distributions on SE(D) with concentrated
Gaussian [35] priors. We validate our methods on 2D and
3D sequences containing different object types. We show
that the parts in one data sequence transfer to other data
sequences of the same object type (but different instance).
Finally, we present novel analysis of the motion of different
17426
x1
θ1k
x2
θ2k
· · ·
· · ·
xT
θTk
y1n y2n yTn
z1n z2n zTnπ
N1 N2 NT
∞
Figure 2. Simplified graphical model for an unknown number
of time-varying parts {θtk}T,∞
t=1,k=1coupled by shared dynamics
{xt}Tt=1. Observations ytn are generated by part k if ztn = k.
Stick weights {πk}∞
k=1 influence the observation counts for each
part. Priors and {Q,ωk, Sk,Wk, Ek} omitted for clarity.
regions of an object, such as segmenting an object based on
the motion of its parts relative to the body frame.
2. Related Work
This work draws on body/parts models, Bayesian non-
parametric dynamical models and Lie groups. Each con-
tain a rich literature so we highlight only the most relevant
details. Importantly, we are aware of no work that mod-
els body and part motion over time with Lie group dynam-
ics, that is also unsupervised and nonparametric in the parts.
Body and Part Models The many treatments of part-based
modeling begin with the pioneering work on human models
of pictorial structures [10] and cardboard people [18]. Later
work on deformable parts models [9] removes the need to
define object-specific part configurations. Building on the
success of offline analysis, real-time human pose tracking is
now possible as well [27, 13]. All of these methods require
specifying the number of parts. More detailed shape and
pose models have been developed for a variety of objects,
using a combination of known body models, mesh represen-
tations and sophisticated collection schemes including mul-
tiple cameras, IMUs, lasers and/or specially-painted targets
[21, 41, 3, 24].
Unsupervised methods [36, 20, 26, 41, 40] have signif-
icant restrictions such as working for only 2D or only 3D
data, or requiring annotated landmarks or point correspon-
dences. In contrast, our unsupervised method works for 2D
and 3D inputs, requires only a single sensor observing an
object in motion, requires no distinctive or annotated object
markings, and no observation correspondences.
Lie Groups Our work relies on the Lie group SE(D), the
space of rigid transformations, for representing body and
part motion. Lie groups have been used extensively in
robotics and computer vision tasks such as SLAM [6], nav-
igation [19], and parts-based models [5, 14, 17]. Defining
observation models for Lie groups is challenging since the
group is not a vector space. As such, notions of distance
(and therefore distributions) require special care [23, 37],
e.g., simple additive noise models violate the group topol-
ogy. Our approach defines a distribution (Gaussian) in the
tangent plane about an element of the group [35]. Most
works that model dynamics with SE(D) perform inference
with approximate filters or smoothers, commonly the EKF
[4] or UKF [6]. One exception that does full posterior in-
ference is [28], though that work is not a dynamical model.
See [8] for an accessible introduction to Lie groups, and
[16] for a more thorough introduction.
Nonparametric Models Sequential models extending the
well-known Dirichlet process (DP) [1] include the HDP-
HMM [31], sticky HDP-HMM [12], infinite HMM [2], and
infinite factorial HMM (ifHMM) [15]. Each of these permit
an infinite number of states, but are restricted to discrete
labels. Extensions to continuously-varying latent states in-
clude the HDP-SLDS [11], dynamic HDP [25], mixture of
DPs [7], and the evolutionary HDP [38]. While each has the
desirable property of shared global dynamics, none capture
component persistence allowing new atoms at each time in-
stance. This is undesirable for parts modeling as objects do
not tend to acquire and lose parts over time and nonpara-
metric priors already risk creating duplicate parts [12].
Closely related is the infinite factorial dynamical
model [32], a continuous extension of the ifHMM which
only permits shared global binary on/off states, and the
Transformed Dirichlet Process [30], a DP allowing multi-
ple groups of observations to share the same set of atoms
(but with no dynamics). Most relevant, and what we use for
comparison, is the Bayesian nonparametric model of Zhou
et al. [39], a linear dynamical model where parts are inde-
pendently sampled from a Dirichlet process at each time
(but with no part persistence or Lie group representation).
3. Model
Let t = 1, . . . , T index time, k = 1, . . . ,∞ index
parts, and n = 1, . . . , Nt index observations at time t.
Most generally, our nonparametric parts model (Figure 2)
takes as its sole input observations {yt}Tt=1 where the tth
batch yt = {ytn}Nt
n=1 contains Nt observations with un-
known correspondence. There is a global (body) dynamic
with time-varying parameters xt and time-fixed parameter
Q. There are an unknown number of components (parts)
with time-varying parameters θtk and time-fixed parame-
ters {ωk, Sk, Ek,Wk}. Stochastic dynamics models f, g
and stochastic observation model h are, for each t, k, n,
xt ∼f(xt−1, Q) θtk ∼ g(θ(t−1)k, ωk, Sk)
ytn ∼ h(xt, θtztn , ωztn , Eztn)
where ztn = k indicates that observation ytn was generated
by component k. A prior probability of association is given
7427
by the discrete distribution of stick weights π (for α > 0):
ztn ∼ π π ∼ GEM(α) (1)
To specialize for object and parts modeling we must
further specify the domain of random variables
{ytn, xt, θtk, ωk, Sk, Ek,Wk, Q}, the form of priors
{Hx, Hθ, Hω, HS , HE , HW , HQ} and the forms of
{f, g, h}. First, we introduce distributions on Lie groups.
3.1. Lie Groups
A Matrix Lie group G is a continuous group whose ele-
ments can be described by matrices with special structure.
In this work, G = SE(D), the space of rigid transforma-
tions on RD or G = SO(D) the space of D-dimensional
proper rotations. Associated to G is Lie algebra g, which
can be viewed as a local vector space approximation about
the identity element of G. This approximation can be made
with respect to any element in G because group elements
compose via matrix multiplication and each element has an
inverse. For b, µ ∈ G we call the local vector space ap-
proximation about µ the tangent space of µ, denoted TµG.
Mappings to and from the tangent space of µ are accom-
plished via the left-invariant Riemannian logarithm and left-
invariant Riemannian exponential,
Log : G×G → g = Logµb = logG(µ−1b) (2)
Exp : G× g → G = Expµv = µ expG(v) (3)
where v ∈ g is a tangent vector in the tangent space of µ,
and logG, expG are the Lie group logarithm and exponential
maps, which can be computed using the matrix logarithm
and matrix exponential. Note that v is called a tangent vec-
tor even though it is represented as a matrix: this is because
a bijective mapping exists between a matrix and vector rep-
resentation of v. We omit additional notation for brevity.
3.1.1 Distributions on Lie Groups
Constructing a distribution on G to reason over body and
parts models is complicated by the fact that G is not a vec-
tor space. Exploiting the maps between group elements and
their tangent spaces, we can define a distribution with loca-
tion parameter µ ∈ G by mapping its support to the tangent
space of µ. Define the left-invariant concentrated Gaussian
NL(·) in terms of the multivariate Gaussian N(·):
NL(b|µ,Σ) = N(Logµb|0,Σ) (4)
In similar fashion to [29] this can be thought of as a Gaus-
sian in the tangent space about mean µ ∈ G. The covari-
ance Σ exists in the tangent space and can be understood
to operate the same as in the typical Euclidean case except
that vectors in TµG must be mapped back to the group by
Eqn. 3. See Figure 3 for a visualization on SO(2).
µ<latexit sha1_base64="g5K6J6SxKYS4WTidGSLH86noS7M=">AAACRXicbZDLSsNAFIYn9VbjrdWlm8EiuCghqbcsC25cKloVmlIm05M4OJmEmYlQQh7BrT6Sz+BDuBO3Oq1B1Hpghp///Ic584UZZ0q77otVm5tfWFyqL9srq2vrG43m5pVKc0mhR1OeypuQKOBMQE8zzeEmk0CSkMN1eHcy6V/fg1QsFZd6nMEgIbFgEaNEG+siSPJho+U67rTwrPAq0UJVnQ2bVjsYpTRPQGjKiVJ9z830oCBSM8qhtINcQUboHYmhb6QgCahBMd21xLvGGeEoleYIjafuz4mCJEqNk9AkE6Jv1d/exPyv18915A8KJrJcg6BfD0U5xzrFk4/jEZNANR8bQahkZldMb4kkVBs8th2MIDIMp/sUIsuGsSTjspBxWBbOoe+3v69yNptKImKo0gZZGzu+d9SeYCwNX+8vzVlx1XG8fadzftDq+hXpOtpGO2gPeegYddEpOkM9RFGMHtAjerKerVfrzXr/itasamYL/Srr4xOb0K31</latexit>
b<latexit sha1_base64="wltangdFlEPwKkPAAjQ1M4Ty6eA=">AAACQ3icdZDLSsNAFIYnXmu9tbp0M1gEFyUkXtNdwY3LFqwKbSmT6UkcnEzCzEQoIU/gVh/Jh/AZ3IlbwUmN4vXADD//+Q9z5vMTzpR2nEdrZnZufmGxslRdXlldW6/VN85VnEoKPRrzWF76RAFnAnqaaQ6XiQQS+Rwu/OuTon9xA1KxWJzpSQLDiISCBYwSbayuP6o1HNuZFv4iWi3P9VrYLZ0GKqszqlvNwTimaQRCU06U6rtOoocZkZpRDnl1kCpICL0mIfSNFCQCNcymm+Z4xzhjHMTSHKHx1P06kZFIqUnkm2RE9JX62SvMv3r9VAfeMGMiSTUI+v5QkHKsY1x8G4+ZBKr5xAhCJTO7YnpFJKHawKlWB2MIDMHpPplIklEoySTPZOjnmX3oec3PK/+djSURIZRpg6yJbc89ahY8c8P3AyL+X5zv2e6+vdc9aLS9knQFbaFttItcdIza6BR1UA9RBOgW3aF768F6sp6tl/fojFXObKJvZb2+AQJDrTE=</latexit>
c<latexit sha1_base64="PaAUfikLs0dfYBgrdirYKVm71hY=">AAACQ3icdVDLSsNAFJ34tj6rSzeDRXBRQpJWG3eCG5cK1hbaUibTmzg4mYSZiVBCvsCtfpIf4Te4E7eC0xrF54UZDuecy5w5QcqZ0o7zaM3Mzs0vLC4tV1ZW19Y3NqtblyrJJIU2TXgiuwFRwJmAtmaaQzeVQOKAQye4PpnonRuQiiXiQo9TGMQkEixklGhDndPhZs2xnelgxz7wjpquZ4DXbLiNFnZLqYbKORtWrXp/lNAsBqEpJ0r1XCfVg5xIzSiHotLPFKSEXpMIegYKEoMa5NOkBd4zzAiHiTRHaDxlv27kJFZqHAfGGRN9pX5qE/IvrZfp0B/kTKSZBkHfHwozjnWCJ9/GIyaBaj42gFDJTFZMr4gkVJtyKpX+CELT4DRPLtJ0GEkyLnIZBUVuH/h+/fMqfnsTSUQEpdtUVse27x7WTY1OYfr9KBH/Dy49223Y3nmzduyXTS+hHbSL9pGLWugYnaIz1EYUAbpFd+jeerCerGfr5d06Y5U72+jbWK9vAymtMQ==</latexit>
vc<latexit sha1_base64="+D/72OzYOboktcC3P+upsbIlVJ4=">AAACRXicdVDLSsNAFJ34tj6rSzeDRXBRQpK2Nu4ENy4VrQptKZPpTRycTMLMRCghn+BWP8lv8CPciVud1ig+L8xwOOdc5swJUs6UdpxHa2p6ZnZufmGxsrS8srq2Xt04V0kmKXRowhN5GRAFnAnoaKY5XKYSSBxwuAiuD8f6xQ1IxRJxpkcp9GMSCRYySrShTm8GdLBec2xnMtixW95+0/UM8JoNt9HGbinVUDnHg6pV7w0TmsUgNOVEqa7rpLqfE6kZ5VBUepmClNBrEkHXQEFiUP18krXAO4YZ4jCR5giNJ+zXjZzESo3iwDhjoq/UT21M/qV1Mx36/ZyJNNMg6PtDYcaxTvD443jIJFDNRwYQKpnJiukVkYRqU0+l0htCaDqc5MlFmg4iSUZFLqOgyO2W79c/r+K3N5FERFC6TWV1bPvuXt3U6BSm348S8f/g3LPdhu2dNGsHftn0AtpC22gXuaiNDtAROkYdRFGEbtEdurcerCfr2Xp5t05Z5c4m+jbW6xvdjq4a</latexit>
vµ<latexit sha1_base64="dHkjtW8f8VGYt8sryJVRZOSoTw8=">AAACSXicbZDLSsNAFIYnrdd6a3XpZrAILkpIvGZZcONSwdpCW8pkepIOnUzCzKRQQh7CrT6ST+BjuBNXTmoQtR6Y4ec//2HOfH7CmdKO82pVqiura+sbm7Wt7Z3dvXpj/0HFqaTQoTGPZc8nCjgT0NFMc+glEkjkc+j60+ui352BVCwW93qewDAioWABo0QbqzsbZYMozUf1pmM7i8LLwi1FE5V1O2pYrcE4pmkEQlNOlOq7TqKHGZGaUQ55bZAqSAidkhD6RgoSgRpmi31zfGycMQ5iaY7QeOH+nMhIpNQ88k0yInqi/vYK879eP9WBN8yYSFINgn49FKQc6xgXn8djJoFqPjeCUMnMrphOiCRUG0S12mAMgeG42CcTSTIKJZnnmQz9PLMvPK/1feXL2VgSEUKZNsha2Pbcy1aBseDr/qW5LB5ObffMPr07b7a9kvQGOkRH6AS56Aq10Q26RR1E0RQ9oif0bL1Yb9a79fEVrVjlzAH6VZXqJ5aCrus=</latexit>
vb<latexit sha1_base64="4T73uB9ChibxoOckynDtu4SwqgQ=">AAACR3icbZDLSsNAFIYn9VbjXZduBovgooSk3rIsuHGpYKvQhjKZnqRDJ5MwMymUkGdwq4/kI/gU7sSlk1pErQdm+PnPf5gzX5hxprTrvlq1peWV1bX6ur2xubW9s7u331VpLil0aMpT+RASBZwJ6GimOTxkEkgScrgPx1dV/34CUrFU3OlpBkFCYsEiRok2VmcyKMJysNtwHXdW+Ft4f0UDzetmsGc1+8OU5gkITTlRque5mQ4KIjWjHEq7nyvICB2TGHpGCpKACorZtiU+Ns4QR6k0R2g8c39OFCRRapqEJpkQPVJ/e5X5X6+X68gPCiayXIOgXw9FOcc6xdXX8ZBJoJpPjSBUMrMrpiMiCdUGkG33hxAZirN9CpFlg1iSaVnIOCwL59z3m99XuZhNJRExzNMGWRM7vnfRrHhWfBdoLopuy/FOndbtWaPtz0nX0SE6QifIQ5eoja7RDeogihh6RE/o2Xqx3qx36+MrWrPmMwfoV9WsT61Yrfw=</latexit>
NL(µ,Σ)<latexit sha1_base64="PliwDrbDhe3EJL3k1NtJv/Y7GfE=">AAACcXicbZDditQwFMcz9Wsdv2b1SrwJOwgr1tKuX71c8MYLkRWd3YXpUNLMaSdsmpbkVBxCnsKn8VafwufwBUxniuiOBxL+/M//kJNf0UphMI5/joIrV69dv7F3c3zr9p279yb7909N02kOM97IRp8XzIAUCmYoUMJ5q4HVhYSz4uJN3z/7DNqIRn3CdQuLmlVKlIIz9FY+eZYhfEH73uV2q945ZzMJJR5mdRfS7KOoapZpUa3wicsn0ziKN0V3RTKIKRnqJN8fhdmy4V0NCrlkxsyTuMWFZRoFl+DGWWegZfyCVTD3UrEazMJu/uXoY+8sadlofxTSjfv3hGW1Meu68Mma4cpc7vXm/3rzDst0YYVqOwTFtw+VnaTY0B4SXQoNHOXaC8a18LtSvmKacfQox+NsCaXnvdnHqrbNK83WzuqqcDZ6mabhn8vtZhvNVAVD2iMLaZQmr8IeY883uUxzV5weRcnz6OjDi+lxOpDeI4/IATkkCXlNjslbckJmhJOv5Bv5Tn6MfgUPAxocbKPBaJh5QP6p4OlvW928VA==</latexit>
TµG<latexit sha1_base64="UgJEOVPZwgghoz9hqbB+nMtkuLA=">AAACTHicbZDLSgMxFIYz9VbrrerSTbAILsow43WWggtdVrAqtqVk0jNjMJMZkoxQwryFW30k976HOxHM1EHUeiDh5z//ISdfmHGmtOe9OrWZ2bn5hfpiY2l5ZXWtub5xpdJcUujSlKfyJiQKOBPQ1UxzuMkkkCTkcB3en5b96weQiqXiUo8zGCQkFixilGhr3V4OTT/JC3NWDJstz/UmhaeFX4kWqqozXHfa/VFK8wSEppwo1fO9TA8MkZpRDkWjnyvICL0nMfSsFCQBNTCTlQu8Y50RjlJpj9B44v6cMCRRapyENpkQfaf+9krzv14v11EwMExkuQZBvx6Kco51isv/4xGTQDUfW0GoZHZXTO+IJFRbSo1GfwSRRTnZx4gsG8aSjAsj47Aw7mEQtL+vYjqbSiJiqNIWWRu7gX/ULjGWfP2/NKfF1Z7r77t7Fwetk6AiXUdbaBvtIh8doxN0jjqoiygS6BE9oWfnxXlz3p2Pr2jNqWY20a+qzX8CJQWwJg==</latexit>
Logµc = vc
<latexit sha1_base64="PXZo0M5eFbz32sF5wxb/PJYt4kM=">AAACYXicdZDLatwwFIY17i1xL5kky1AQHQpdDMb2TBp3UQhk00UXKXSSwHgwsubYEZElI8mhg9CqT9Nt+zRZ50Wqmbil1wMSP/9/Djr6ypYzbeL4ZhDcu//g4aOt7fDxk6fPdoa7e2dadorCjEou1UVJNHAmYGaY4XDRKiBNyeG8vDpZ5+fXoDST4qNZtbBoSC1YxSgx3iqGz3MDn4x9L2tX2LzpnA0xpqHDb/F1QYvhKI7iTeE4OkzfTJPUi3Q6SSZHOOmjEerrtNgdjPOlpF0DwlBOtJ4ncWsWlijDKAcX5p2GltArUsPcS0Ea0Au7+YfDL72zxJVU/giDN+6vE5Y0Wq+a0nc2xFzqP7O1+a9s3pkqW1gm2s6AoHcPVR3HRuI1FLxkCqjhKy8IVczviuklUYQajy4M8yVUnu9mHyvatqgVWTmr6tLZ6DDLxj8v93evVETU0Hd7ZGMcZcnrsccYO8/3B0T8f3GWRskkSj9MR8dZT3oLHaAX6BVK0BE6Ru/QKZohij6jL+gr+ja4DbaDYbB31xoM+pl99FsFB98BMES0Tw==</latexit>
G = SO(2)<latexit sha1_base64="zTM3QZLSgZIXJLXjDS79oKDT7zU=">AAACVHicdZBdSxtBFIZn149qWjXRS28GQ8FCWHbj194IQi/qnRGNCkkIs5Oz6+Ds7DJzNjQs+0962/4kof+lF53EWKrRAzO8vOc9nMMT5VIY9P3fjru0vLL6YW299vHTxuZWvbF9Y7JCc+jyTGb6LmIGpFDQRYES7nINLI0k3EYPX6f92zFoIzJ1jZMcBilLlIgFZ2itYb3+jZ7SPsJ3LK8uqv32l2G96XvH/kl4eEAXReD5s2qSeXWGDafVH2W8SEEhl8yYXuDnOCiZRsElVLV+YSBn/IEl0LNSsRTMoJydXtHP1hnRONP2KaQz9/+JkqXGTNLIJlOG9+Z1b2q+1esVGIeDUqi8QFD8aVFcSIoZnXKgI6GBo5xYwbgW9lbK75lmHC2tWq0/gtgind1TqjwfJppNqlInUVV6R2HY+vdVi9lMM5XAPG2RtagXBscti9GvLN9niPR9cdP2ggOvfXnYPAvnpNfILtkj+yQgJ+SMnJMO6RJOxuQH+Ul+OY/OH3fJXXmKus58Zoe8KHfzL4a/r8M=</latexit>
Expµvb = b<latexit sha1_base64="hZL+K7h+yxGaDA+More5o8aJ3FU=">AAACW3icbZDNatwwFIU1Tn9SN20mKaWLbkSHQheDsdM/bwKBUugyhU4SGA9G1lw7IrIspOuQQehpsk0fqIu+SzWTIbSZXpA4nHsuuvoqLYXFNP01iLYePHz0ePtJ/HTn2fPd4d7+ie16w2HCO9mZs4pZkELBBAVKONMGWFtJOK0uviz7p5dgrOjUD1xomLWsUaIWnGGwyuGrAuEK3dcr7UtXtL13l2XlD6tyOEqTdFX0TmT3xYis67jcG4yLecf7FhRyyaydZqnGmWMGBZfg46K3oBm/YA1Mg1SsBTtzqx94+jY4c1p3JhyFdOX+PeFYa+2irUKyZXhu7/eW5v960x7rfOaE0j2C4rcP1b2k2NElDjoXBjjKRRCMGxF2pfycGcYxQIvjYg51ILvaxymty8awhXemqbxLPub5+O7ym9nOMNXAOh2QjWmSZ5/GS54+8N2guSlODpLsfXLw/cPoKF+T3iavyRvyjmTkMzki38gxmRBOPLkmN+Tn4He0FcXRzm00GqxnXpB/Knr5B85As14=</latexit>
Figure 3. Location and scale distributions such as Gaussians can
be locally defined about element µ in Lie group G by mapping
their support into the local vector space approximation TµG.
3.1.2 SE(D) Actions
Represent b, c ∈ G = SE(D) as real block matrices,
b =
(
Rb db0 1
)
c =
(
Rc dc0 1
)
(5)
The R∗ are D×D rotation matrices (with determinant +1)
in SO(D) and the d∗ are translations in RD. Henceforth,
we use this notation to represent the rotation and translation
components of any element in SE(D); for example, if xt ∈SE(D) then it has rotation Rxt
and translation dxt.
Elements of SE(D) compose via matrix multiplication
(maintaining group closure), and act as a change of basis
for homogeneous coordinates p of point p ∈ RD:
bcp =
(
Rb (Rcp+ dc) + db1
)
(6)
If p are coordinates in frame c then cp can be interpreted
as its coordinates in frame b and bcp can be interpreted as
its coordinates in the standard (or world) basis. In general,
changes of bases are best viewed as composing from left to
right, but acting on points from right to left.
3.2. Body and Parts Model
Let G = SE(D) for dimension D ∈ {2, 3}. We seek
to infer a parts decomposition of an articulating object by
directly observing it in motion. Specifically, we model the
inputs ytn ∈ RD as being random collections of points sam-
pled within the object as it moves across time. Variable (in-
cluding no) observations are supported at each time, and no
correspondence between observations is assumed. Diverse
inputs are supported, including foreground pixels of 2D im-
age sequences, unprojected points from depth sequences,
and 3D point clouds sampled within mesh sequences.
We assume part persistence–an object does not gain or
lose parts over time. We also assume that parts move
smoothly through space but remain close (in an L2 sense)
to a common body which also moves smoothly. The re-
lation between body and part motion could be modeled in
7428
Wxtωkθtk<latexit sha1_base64="P7VyKS6Q/AwGx1jgDfqjNubk3Uo=">AAACh3icdVHLbtswEKTVRxL35bTHXogaBXowVClNHB1T5JJjWtRxAMsQKHolE6YogVwFMQh+UP+if9Br8zehHy2StF2AxGBmyF3M5o0UBqPophM8evzk6c7uXvfZ8xcvX/X2X1+YutUcRryWtb7MmQEpFIxQoITLRgOrcgnjfHG60sdXoI2o1TdcNjCtWKlEIThDT2W90xThGtf/WNU0WanZ0tmxu0PnkvGFs9cZOpvWFZQsW9AU54Ass7hwLuv14zBaF43CJD4aJkMPtsxvqU+2dZ7tdwbprOZtBQq5ZMZM4qjBqWUaBZfgumlroPFdWQkTDxWrwEztehxH33tmRota+6OQrtm7LyyrjFlWuXdWDOfmobYi/6VNWiySqRWqaREU3zQqWkmxpqvo6Exo4CiXHjCuhZ+V8jnTjKMPuNtNZ1D4LTzMUpe5s+FRkgz+XO5vb62ZKmHr9pENqA9yOFjFeC/f/4OLgzD+FB58OeyffP2xSXqXvCXvyAcSk2NyQs7IORkRTr6Tn+QXuQn2go/BMEg21qCz3c4bcq+Cz7fHmce4</latexit>
xtωkθtk<latexit sha1_base64="oM3e5pHITKZ57/02PVUBvSvHT7M=">AAACtXiclVHLbtQwFPUEaEt4TWHJxmJUicUoSgqdZlmpG5YFMZ1KkyhyHCdjjWMH+wZ1ZPnf+IuKj0HC8wC1BRZcydbROcf3WueWneAG4vj7IHjw8NHe/sHj8MnTZ89fDA9fXhrVa8qmVAmlr0pimOCSTYGDYFedZqQtBZuVy/O1PvvKtOFKfoZVx/KWNJLXnBLwVDH8kgG7hk0fWwpCl85eF+BsplrWkGKJM1gwIIWFpXPh0S237Lqi0WTl7Mz9T5NiOEqieFM4jtLkZJJOPNgxv6QR2tVFcTgYZ5WifcskUEGMmSdxB7klGjgVzIVZb1jnp5KGzT2UpGUmt5vvOHzkmQrXSvsjAW/Y2y8saY1ZtaV3tgQW5r62Jv+mzXuo09xy2fXAJN0OqnuBQeF10LjimlEQKw8I1dz/FdMF0YSCX0cYZhWr/c7uZ6mb0tnoJE3Hvy/3p1dpIhu2c/vIxtgHORmvY7yT77/B5XGUvIuOP74fnX36tk36AL1Gb9BblKBTdIY+oAs0RRTdoB+DvcF+cBrkQRXUW2sw2G3nFbpTgfoJo87aHA==</latexit>ωkθtk
<latexit sha1_base64="KyOahVT1hVY3OzBTWo4RMkMaKmM=">AAACzXiclVFdi9QwFE3r11o/dnbFJ0GLw4IPQ2lXd7aPC7745irOzsJ0KGnmthMmTUtyKzvE+Cr+K/+Gv8K/YOZDcVcFvZBwOPfk3nBO0QquMY6/ev616zdu3tq5Hdy5e+/+bm9v/0w3nWIwYo1o1HlBNQguYYQcBZy3CmhdCBgXi5er/vg9KM0b+Q6XLUxrWkleckbRUXnvs8maGiqaL8IM54A0N7iwNjjIEC5wPd8UgrKFNRc52n9Qy7bNK0WX1ozt/wzJe/0kitcVxlGaHA3ToQNb5kerT7Z1mu95g2zWsK4GiUxQrSdJ3OLUUIWcCbBB1mlo3VZawcRBSWvQU7P+jg0PHDMLy0a5IzFcs7++MLTWelkXTllTnOurvRX5p96kwzKdGi7bDkGyzaKyEyE24SqAcMYVMBRLByhT3P01ZHOqKEMXUxBkMyhdlle9VFVhTXSUpoOfl/1d2ygqK9iqnWWD0Bk5HKxsvOTv38HZYZQ8jw7fvOifvP2ycXqHPCJPyTOSkGNyQl6RUzIijHzzHnqPvSf+a7/zP/gfN1Lf26bzgFwq/9N3kwrjKw==</latexit>
xt<latexit sha1_base64="k/affCTIbqW8+NexCv7gMMzxpcE=">AAAC5XiclVHLbtQwFHXCoyW8pmXJxmJUicUQJS2dZlmJDcuCmE6lyShyPDcZaxwnsm9QR1Y+gV3FFvEn/AZ/g+cBoqVIcCVbR+ce32udkzdSGIyi755/5+69+zu7D4KHjx4/edrb2z83das5jHgta32RMwNSKBihQAkXjQZW5RLG+eLNqj/+CNqIWn3AZQPTipVKFIIzdFTW+5oiXOJ6js0l44vOXmbYBQc2rSsoWbagKc4BWWZx0Tn+dv0/qFXTZKVmy86Ou/8ZkvX6cRiti0ZhEh8Pk6EDW+Znq0+2dZbteYN0VvO2AoVcMmMmcdTg1DKNgkvogrQ10LitrISJg4pVYKZ2/Z2OHjhmRotau6OQrtnfX1hWGbOscqesGM7Nzd6KvK03abFIplaopkVQfLOoaCXFmq6CoTOhgaNcOsC4Fu6vlM+ZZhxdfEGQzqBwGd/0Upd5Z8PjJBn8uro/tbVmqoSt2lk2oM7I4WBl4zV//w7OD8P4KDx897p/+v7bxuld8py8IC9JTE7IKXlLzsiIcG/He+UNvRO/9D/5V/7njdT3tuk8I9fK//IDsQvsww==</latexit>
Wxt<latexit sha1_base64="xYcjBSsdkTcoJ/MwqHPvTRCAIbQ=">AAADFniclVJNj9MwEHXC1xK+unDkYlFV4lBFyS5bclyJC8cF0e1KTYkcd5JadT5kT9BWVv4HvwZxQVwRN/4NTltQdykSjGRr9N6bGeuN01oKjUHww3Fv3Lx1+87BXe/e/QcPH/UOH5/rqlEcxrySlbpImQYpShijQAkXtQJWpBIm6fJVx08+gNKiKt/hqoZZwfJSZIIztFDS+z6IES5x3cikkvFlay4TbL0duKzrJFds1ZpJa/bLW29g4qqAnCVLGuMCkCUGlx2+v+Af1Ltj/6dJ0uuHfrAOGvhReDKKRjbZIr+oPtnGWXLoDON5xZsCSuSSaT0NgxpnhikUXIL1otFQ26ksh6lNS1aAnpn1c1o6sMicZpWyp0S6RncrDCu0XhWpVRYMF/o614H7uGmDWTQzoqwbhJJvBmWNpFjRbpN0LhRwlCubMK6EfSvlC6YYR7tvz4vnkNlPcd1Llaet8U+iaPj7av/UVoqVOWzV1rIhtUaOhp2NV/z9e3J+5IfH/tGbF/3Tt583Th+Qp+QZeU5C8pKcktfkjIwJdyLnvZM7C/ej+8n94n7dSF1nu50n5Eq4334CT/4DQg==</latexit> W
<latexit sha1_base64="DRievMG95UXWxuwQyh/DLg9Fx+o=">AAADLXiclVLLbtQwFHXCq4TXFJZsLEaVWIyipKXTLCuxYUdBTKfSZBQ5HidjjeNE9g3qyHL5JH6Bn2CBhNgi/gLPg2qmtBJcydbVuec+dO7NG8E1RNE3z791+87dezv3gwcPHz1+0tl9eqrrVlE2oLWo1VlONBNcsgFwEOysUYxUuWDDfPZ6ER9+ZErzWn6AecPGFSklLzgl4KCs8ysFdg7LOkY2TVYqMrfDYG8DzgWhM2vOM7Bb+CXdDK25nu8STFpXrCTZDKcwZUAyAzNrb2rwD+zNtv9TJOt04zBaGo7CJD7sJ33nrJE/oS5a20m26/XSSU3bikmggmg9iqMGxoYo4FQwG6StZo3rSko2cq4kFdNjsxzH4j2HTHBRK/ck4CW6mWFIpfW8yh2zIjDVV2ML8LrYqIUiGRsumxaYpKtGRSsw1HixYDzhilEQc+cQqribFdMpUYSCO4MgSCescLdyVUtV5taEh0nSu/zs39xaEVmyNdtJ1sNOyH5vIeOWvjc7p/thfBDuv3vVPX7/ZaX0DnqOXqCXKEZH6Bi9QSdogKj31mu9C++T/9n/6n/3f6yovrfezjO0Zf7P38/XDUY=</latexit>
Part k<latexit sha1_base64="DKbLaZRT3q4m0C+jgDah9SfVwLM=">AAADN3iclVJLj9MwEHbCawmvLhy5WHRX4lBFSdGWHFfiwrEgul3UVJHjOqkVx4nsCdrKirjCT+LG3+DEDXFF+wfWfYC2S1eCkWyNvvnmoW8mrQXXEATfHPfGzVu37+zd9e7df/DwUWf/8YmuGkXZiFaiUqcp0UxwyUbAQbDTWjFSpoKN0+LVMj7+wJTmlXwHi5pNS5JLnnFKwEJJ53xIFOCD4sA7jIGdwaqikXWd5Ios2vEWnApCi9acJdDupptxa3bzbYKJq5LlJClwDHMGJDFQtO11Df6Bfbnt/xRJOt3QD1aGAz8KjwbRwDob5HeoizY2TPadXjyraFMyCVQQrSdhUMPUWOk4Faz14kaz2nYlOZtYV5KS6alZjdPiQ4vMcFYp+yTgFXo5w5BS60WZWmZJYK6vxpbgrtikgSyaGi7rBpik60ZZIzBUeLlqPOOKURAL6xCquJ0V0zlRhII9CM+LZyyzV3NVS5WnrfGPoqj352v/5laKyJxt2FayHrZCDnpLGbf0vd456fvhC7//pt89fvt1rfQeeoqeoecoRC/RMXqNhmiEqPPe+eh8cj67X9zv7g/355rqOpvtPEFb5v66AJ/dEEE=</latexit>
Body<latexit sha1_base64="1Nj+wmoqP7zdHc7WXndy5Qgn9qw=">AAADPXiclVJNj9MwEHXC1xK+unDkYtGtxKGKkq5ge1zBhWNBdLtSU0WO46RREjuyJ6uNrBy5wk+Cv8EP4Ia4Im44bYV22yLBSLae3rzxWG8mqopMged9tewbN2/dvnNw17l3/8HDR73Dx2dK1JKyKRWFkOcRUazIOJtCBgU7ryQjZVSwWZS/7vKzCyZVJvh7aCq2KEnKsySjBAwV9n69EnHjDCZEAj7Kj5xBAOwSVu9qXlVhKknTzq7RUUFo3urLENr9cj1r9X69KdCBKFlKwhwHsGRAQg152/6twT+or7b9n0fCXt9zvVXgXeBvQB9tYhIeWsMgFrQuGQdaEKXmvlfBQhvrMlqw1glqxSrTlaRsbiAnJVMLvfpOiweGiXEipDkc8Iq9WqFJqVRTRkZZEliq7VxH7svNa0jGC53xqgbG6bpRUhcYBO4GjuNMMgpFYwChMjN/xXRJJKFg1sJxgpglZne2vZRp1Gr3xXg8/HO1u1ohCU/ZRm0sG2J37L8cdjZ2/vrbbu6Cs5HrH7ujt6P+6bsva6cP0FP0DD1HPjpBp+gNmqApolZofbA+Wp/sz/Y3+7v9Yy21rc10nqBrYf/8DTj5Ehc=</latexit>
World<latexit sha1_base64="zaqxH9lwIJ2dCyWyXQ8W5teNUbE=">AAADW3iclVJNj9MwEHUaPpawwC4ILlwsupU4VFFSBPS4ggvHguh2paaKHMdJozhxZE/QRpZ/DVf4D/wNDvwX3A+h3bZIMJKtpzdvZqznSRpeKAiCn07PvXX7zt2je9794wcPH52cPr5QopWUTangQl4mRDFe1GwKBXB22UhGqoSzWVK+X+VnX5hUhag/Q9ewRUXyusgKSsBS8anzLAJ2BetGum6aOJekM3omJE+NN3gn0s4bTIgEfFaeeYND4tkNOuGElkZfxWAOy/XM6MN6W6AjUbGcxCWOYMmAxBpKY/424B/U18f+T5P4pB/4wTrwPgi3oI+2MbFODqNU0LZiNVBOlJqHQQMLba0rKGfGi1rFGjuV5GxuYU0qphZ6/RyDB5ZJcSakPTXgNXu9QpNKqa5KrLIisFS7uRV5KDdvIRsvdFE3LbCabgZlLccg8GodcFpIRoF3FhAqC/tWTJdEEgp2aTwvSllmN2vXS5knRvuvx+Phn8vsa4Ukdc62amvZEPvj8M1wZePK33DXzX1wMfLDV/7o46h//unHxukj9By9QC9RiN6ic/QBTdAUUcc4X51vzvfeL9d1Pfd4I+052995gm6E+/Q3p9oaDw==</latexit>
Figure 4. The frames that comprise an object at time t. Per-time body frames xt are rigid transformations from world frame W . Each part
k contains a time-fixed canonical part frame ωk and a per-time part frame θtk. ωk are a rigid transformation from body frame xt while
θtk are a rigid transformation from ωk. Using stabilized random walk dynamics, each per-time part frame θtk is designed to transform
smoothly over time but remain near the origin of their respective canonical part frame ωk.
many ways: one naive extreme would be to model them
as floating bodies with linear dynamics while the other ex-
treme would be to model them as existing in a skeletal net-
work of joints. Linear dynamics would fail to capture part
articulation while skeletal networks are overly restrictive.
We take a middle ground: parts {θtk, ωk, Sk, Ek,Wk}are modeled as floating bodies that rotate and translate
smoothly through space about a body frame xt ∈ G, but
whose origins tend to remain near the origin of a canonical
part frame ωk ∈ G through stabilized random walk dynam-
ics. Canonical part frames are close to the body frame and
remain fixed across time but parts also have per-time frames
θtk ∈ G. Parts are not fixed in their spatial extent; instead,
they have a probabilistic, ellipsoidal shape model governed
by Gaussian covariance Ek. Part dynamics are governed by
covariance Sk and body dynamics are governed by covari-
ance Q. The dispersion of canonical part frames about the
body frame is governed by covariance Wk. Figure 4 graph-
ically depicts how body and part frames compose.
3.2.1 Body and Part Dynamics
Body frames xt and parts evolve independently, but are im-
plicitly coupled through the observation model. In particu-
lar, the body frame stochastic dynamics model is:
xt ∼ NL(xt | xt−1, Q) (7)
Object dynamics are a non-linear random walk on G whose
noise covariance Q exists in the tangent space about the
body frame at the previous time. Canonical part frames ωk
are dispersed about the body frame with covariance Wk,
ωk ∼ Hω = NL(· | I,Wk) (8)
where I ∈ G is the identity element (no translation or rota-
tion) and covariance Wk can be thought to (implicitly) exist
in the tangent space of xt. Each part has per-time dynamics
θtk with driving noise covariance Sk governed by:
θtk =
(
ExpRθ(t−1)kφtk A dθ(t−1)k
+B mtk
0 1
)
(9)
with constants A = diag (√a, . . . ,
√a), B =
diag(√
1− a, . . . ,√1− a
)
and Exp in Eqn. 9 the Rieman-
nian exponential for SO(D). φtk ∈ so(D) is a vector in the
tangent space of Rθ(t−1)k. Part translation driving noise mtk
and rotation driving noise φtk are jointly distributed:
(mtk, φtk) ∼ N (0, Sk) (10)
As proven in the supplemental, carefully chosen coefficients
of matrices A,B (a = 0.95) cause the asymptotic covari-
ance of the part translation dθtk to equal the covariance of
translation driving noise mtk. This form enables parts to
transform smoothly, but never too far from their canonical
location, and mitigated part confusion during inference.
All driving noise covariances are drawn from Inverse-
Wishart distributions, where we note that our model sup-
ports arbitrary correlations between translation and rotation
for object, canonical part, and part transformations:
Q ∼ HQ = IW(·|vQ0 ,ΛQ0) (11)
Sk ∼ HS = IW(·|vS0 ,ΛS0) (12)
Wk ∼ HW = IW(·|vW0 ,ΛW0) (13)
And initial body and part frames are drawn according to:
x1 ∼ Hx = NL(·|x0,Σx) (14)
θ1k ∼ Hθ = NL(·|θ0,Σθ) (15)
7429
3.2.2 Observation Models for 3D and 2D data
Input ytn is assumed to be in world coordinate system W ,
which is assumed to be aligned with the sensor’s coordi-
nate system (hence, W has no rotation or translation and is
henceforth omitted). Parts generate observations in their re-
spective part coordinate systems and are mapped to world
coordinates via θtk, ωk and the body frame xt. That is, part
k generates point etn ∼ N(0, Ek) which is then mapped
to world coordinates ytn = xtωkθtketn if ztn = k (where
(·) is a homogeneous projection of (·)). The transformation
is linear in etn allowing straightforward mean and variance
computations of the homogeneous point in world coordinate
ytn, yielding the following observation model (for ztn = k)
ytn ∼ N(ytn|xtωkθtk0R, xtωkθtkEkθ⊤
tkω⊤
k x⊤
t ) (16)
where 0R is the homogeneous zero vector in RD and Ek is a
degenerate block covariance matrix Ek with a zero row and
column (a covariance in homogeneous coordinates). With-
out homogeneous coordinates, this is (via Eqn. 5):
ytn ∼ N (ytn | µtk,Σtk) (17)
µtk = RxtRωk
(dθtk + dωk) (18)
Σtk = RxtRωk
RθtkEkR⊤
θtkR⊤
ωkR⊤
xt(19)
While simple, it accommodates image plane observations
in 2D, depth observations in 2.5D and XYZ observations
in 3D. Incorporating additional terms (e.g., appearance) is
straightforward, but were not needed for our purposes. As
with most generative models, robustness to missing data
(common for depth sensors) is handled seamlessly.
The observation covariance Σtk for ytn is some rotation
of Ek for ztn = k due to the composition of body and part
frames. Consequently, Ek is constrained to be diagonal (i.e.
axis-aligned) so as to avoid ambiguity. While the use of Ek
implies a probabilistic, ellipsoid part shape model, its pri-
mary function is to yield robust associations ztn of observa-
tions to parts. Here, we use the following prior:
Ek ∼ HE = IW(·|vE0,ΛE0
) (20)
4. Inference
We wish to sample from the posterior, which has likeli-
hood proportional to the product of Equations (1, 7, 8, 10,
11, 12, 13, 14, 15, 17, 20) over all t, k, n. This is accom-
plished with Markov Chain Monte Carlo (MCMC) infer-
ence that exploits Gaussian statistics in the tangent space
for efficient updates while simultaneously respecting the ge-
ometry of the Lie group via Eqns. 2 and 4. This is accom-
plished by sampling from the full conditional distributions
of each latent variable, grouped in order of discussion,
(xt, θtk, ωk) ztn (π,Ek, Sk, Q) (21)
xt
xt−1
x−1t−1xt
G = SE(3)<latexit sha1_base64="zWzBwy2qvs0nsiW3tmn9a/Tlg+U=">AAACVHicbZBdSxtBFIZn1/qV+hHtpTdDQ0EhLLux1dwIUin10lIThSSE2cnZzZDZ2WXmrBiW/Sfe6k8S/C+96CQu0iYemOHlPe/hHJ4wk8Kg77847sqH1bX1jc3ax63tnd363n7XpLnm0OGpTPVtyAxIoaCDAiXcZhpYEkq4CScXs/7NHWgjUnWN0wwGCYuViARnaK1hvf6TntE+wj0Wv3+Uh8dHw3rD9/x50WURVKJBqroa7jnN/ijleQIKuWTG9AI/w0HBNAouoaz1cwMZ4xMWQ89KxRIwg2J+ekm/WGdEo1Tbp5DO3X8nCpYYM01Cm0wYjs1ib2a+1+vlGLUHhVBZjqD466IolxRTOuNAR0IDRzm1gnEt7K2Uj5lmHC2tWq0/gsgind9TqCwbxppNy0LHYVl439rt5ttXLmdTzVQMVdoia1KvHZw0ZxhLyzdYpLksui0vOPZav742zr9XpDfIAflMDklATsk5uSRXpEM4uSMP5JE8Oc/OH3fFXX2Nuk4184n8V+7OX+FDr3A=</latexit>
xt<latexit sha1_base64="QhKa348BPxP/haUfM0RDFiLhFZA=">AAADYHiclVLLbhMxFPUkPMLwaAM7urFII7GIokwLbZYVbFgGRJpKmWjk8TiT0XjskX0HdWR5w9ewhS/gN9jyJTgPoTZJJbiSraNzz73XOr5xyTMNg8Evr9G8d//Bw9Yj//GTp88ODtvPL7WsFGVjKrlUVzHRjGeCjSEDzq5KxUgRczaJ8/fL/OQLUzqT4jPUJZsVJBXZPKMEHBW1vaPrCPxuCOwaVt2MKMsoVaS2ZiIVT6zffSeT2u+OiAJ8nB/vF09u0TEnNLfGtbZ39bZmv94VmFAWLCVRjkNYMCCRgdzauwb8g/rm2P9pEh12Bv3BKvAuCDaggzYxcnb2wkTSqmACKCdaT4NBCTPjrMsoZ9YPK81KN5WkbOqgIAXTM7N6jsVdxyR4LpU7AvCKvVlhSKF1XcROWRBY6O3cktyXm1YwH85MJsoKmKDrQfOKY5B4uRM4yRSjwGsHCFWZeyumC6IIBbc5vh8mbO7Wa9tLlcbW9N8Oh72/l93VSkVEyjZqZ1kP94fBWW9p49LfYNvNXXB50g9O+ycf33QuPv1cO91CR+gVeo0CdI4u0Ac0QmNEva/eN++796Pxu9lqHjTba2nD2/zOC3Qrmi//AMs8G70=</latexit>
xt−1<latexit sha1_base64="17/U8n0uP/fIzkAu0s9jdXAE8mM=">AAADZHiclVJNj9MwEHVaPpawwC4rTkhg0a3EoUTJskCPK7hwLIhuV2qqyHGcNIoTR/Zk1cjKlV/DFc78Df4AvwOnrdBu25VgJFtPb55nRs8TljxV4Lq/rE731u07d/fu2ff3Hzx8dHD4+FyJSlI2poILeRESxXhasDGkwNlFKRnJQ84mYfahzU8umVSpKL5AXbJZTpIijVNKwFDBofV8EWh45TV23we2gGVFXZRlkEhSN3oiJI9M8r2Iars/IhLwcXa8Wzy5Roec0KzRiwBurN3o3XrzQPsiZwkJMuzDnAExQ2ZNc1ODf1Bfbfs/RYKDnuu4y8DbwFuDHlrHyFg68CNBq5wVQDlRauq5Jcy0sS6lnDW2XylWmq4kYVMDC5IzNdPLcRrcN0yEYyHNKQAv2asvNMmVqvPQKHMCc7WZa8lduWkF8XCm06KsgBV01SiuOAaB273AUSoZBV4bQKhMzayYzokkFMz22LYfsdis2KaXMgkb7bwZDgd/r2ZbKyQpErZWG8sG2Bl6bwetja2/3qab2+D8xPFeOyefTntnn3+unN5DT9EL9BJ56B06Qx/RCI0Rtb5a36zv1o/O7+5+96j7ZCXtWOvfOULXovvsD2MMHTs=</latexit>
Txt−1G
<latexit sha1_base64="bPVk2WaGf1Usa8+HqTBa6yeqXPo=">AAACUHicdZDLbhMxFIbPhFsZLm1hycYiQmIRRnabppNdBQtYFqlpKyVh5HHOTK16PCPbg4iseQ+28EjseBN24KQp4nok27/+8x/56MsbJa2j9GvUu3Hz1u07W3fje/cfPNze2X10auvWCJyIWtXmPOcWldQ4cdIpPG8M8ipXeJZfvlr1z96jsbLWJ27Z4LzipZaFFNwF691J5j9k3r1gXedfd9lOnyajw+EBo4Qmw3FK98dBMMrGI0ZYQtfVh00dZ7vRYLaoRVuhdkJxa6eMNm7uuXFSKOziWWux4eKSlzgNUvMK7dyv1+7Is+AsSFGbcLQja/fXCc8ra5dVHpIVdxf2z97K/Fdv2roinXupm9ahFlcfFa0iriYrBmQhDQqnlkFwYWTYlYgLbrhwgVQczxZYBJzrfbxumqw0fNl5U+adTw7SdPDz6v7O1obrEjfpgGxAkpSNBgEjXfG9hkj+L073Eraf7L0d9o9ebkhvwRN4Cs+BwSEcwRs4hgkIMPARPsHn6Ev0Lfrei66i1y88ht+qF/8AzB+xBg==</latexit>
Logxt−1
xt−1<latexit sha1_base64="GrO6wag0xrxrpzOgCbL58hsCDdk=">AAACYXicbZDNTtwwFIU9oS0w/WGAJapkdYTEYhol/M4StRsWXVCpA0gzo8jx3AQLx47sG8TIyoqn6bZ9GtZ9EZwhQi3TK9k6Ovdc+fpLSyksRtFDJ1h59frN6tp69+279x82eptbF1ZXhsOIa6nNVcosSKFghAIlXJUGWJFKuExvvjb9y1swVmj1A+clTAuWK5EJztBbSe/jBOEO3Ted14m7Sxx+juv6WSS9fhRGi6LLIm5Fn7R1nmx2BpOZ5lUBCrlk1o7jqMSpYwYFl1B3J5WFkvEblsPYS8UKsFO3+EdNd70zo5k2/iikC/fvCccKa+dF6pMFw2v7steY/+uNK8yGUydUWSEo/vRQVkmKmjZQ6EwY4CjnXjBuhN+V8mtmGEePrtudzCDzfBf7OFWWSW7YvHYmT2sXHg2Hg+erXs5qw1QObdojG9BwGB8PGowN3/glzWVxsR/GB+H+98P+6ZeW9BrZIZ/IHonJCTklZ+ScjAgn9+Qn+UV+d/4E60Ev2HqKBp12Zpv8U8HOI+0ftcY=</latexit>
Logxt−1
xt<latexit sha1_base64="2nJrJ/QF8g96tP0eoyPZBcSYUnU=">AAACXXicdZDPbhMxEMadLaUllDaFA6q4WERIHMJqt3/3WMGFQw+tRNpKSbTyOrNbq157Zc+iRpbVp+EKz8OJV8FJAyoURrL16Ztv5PGvaKSwmCTfO9HKo9XHa+tPuk83nm1u9bafn1vdGg5DrqU2lwWzIIWCIQqUcNkYYHUh4aK4/jDvX3wGY4VWn3DWwKRmlRKl4AyDlfd2xgg36E505XN3kzt8l3ofBPq810/iw+Qo29+jD0UaJ4vqk2Wd5tudwXiqeVuDQi6ZtaM0aXDimEHBJfjuuLXQMH7NKhgFqVgNduIWf/D0TXCmtNQmHIV04d6fcKy2dlYXIVkzvLJ/9+bmv3qjFsts4oRqWgTF7x4qW0lR0zkQOhUGOMpZEIwbEXal/IoZxjFg63bHUygD28U+TjVNXhk2885UhXfxQZYNfl/+YVYbpipYpgOyAY2z9HAQMCZzvr8g0v+L89043Yt3z/b7x++XpNfJK/KavCUpOSLH5CM5JUPCyS35Qr6Sb50f0Wq0EW3eRaPOcuYF+aOilz8BQwe0nA==</latexit>
NL(xt | xt−1, Q)<latexit sha1_base64="ffjVVQOnpuKCJlUeoqX9Y0tq9jU=">AAACdHicbZDNbtQwFIU94a+En05hCQuLUaUihSgpUGZZwYYFQq3EtJUmo8hxbjJWHSeyb6oZWXkNnoYtvAMvwhpnGiHocCVbR+eeK19/WSOFwSj6OfJu3b5z997Off/Bw0ePd8d7T85M3WoOM17LWl9kzIAUCmYoUMJFo4FVmYTz7PJD3z+/Am1Erb7guoFFxUolCsEZOisdRwnCCu3nLrWfOptIKPDAX6VIk0rkdJVafBV3AT31Ey3KJb7s0vEkCqNN0W0RD2JChjpJ90ZBkte8rUAhl8yYeRw1uLBMo+ASOj9pDTSMX7IS5k4qVoFZ2M3XOrrvnJwWtXZHId24f09YVhmzrjKXrBguzc1eb/6vN2+xmC6sUE2LoPj1Q0UrKda050RzoYGjXDvBuBZuV8qXTDOOjqbvJzkUDvlmH6uaJi01W3dWl1lnw7fTafDn6raztWaqhCHtkAU0nMZHQY+x5xvfpLktzg7D+HV4ePpmcvx+IL1DnpEX5IDE5B05Jh/JCZkRTr6Sb+Q7+TH65T33Jt7+ddQbDTNPyT/lhb8B/TG7/w==</latexit>
Body Frame Projection Tangent Distribution
Figure 5. Left: Object dynamics of body frame at time t are pro-
jected into coordinates at time t − 1 by the Lie group operation
x−1
t−1xt. Right: The projection is in SE(3) with Gaussian statis-
tics in the tangent plane of xt−1. The figure notionally depicts two
degrees of freedom, whereas SE(3) would have 6 degrees.
where t = 1, . . . , T, k = 1, . . . ,∞, n = 1, . . . , Nt and
omitted leading subscripts are taken to mean joint depen-
dence (i.e. y = {yt}Tt=1 and yt = {ytn}Nt
n=1). Inference
complexity is linear in the number of observations and parts.
In our experiments, chains were generally mixed after about
300 samples, with approximately 1 minute per sample being
the worst-case timing for any data we tested on.
In the sequel we sketch the sampling of body transforma-
tions xt. Full details are in the supplement, along with sam-
pling of the canonical parts ωk and part transformations θtkwhich take a similar form. We also discuss sampling part as-
sociations ztn, which are conjugate except when sampling
assignments to the base measure. The conditionals in the
third grouping (π,Ek, Sk, Q) can be sampled analytically
due to conjugate priors, which we defer to the supplement.
Finally, parts {θtk, ωk, Sk,Wk, Ek} can be sampled in par-
allel across k and ztn can be sampled in parallel across t, n.
4.1. Decomposition of Lie Group dynamics
We exploit the Lie algebra to develop an efficient Gibbs
sampler for dynamical terms {xt, ωk, θtk}. For example,
the operation x−1t−1xt transforms the body frame at time t
into that of the body frame at time t − 1 (Fig. 5, left). This
operation is an element of SE(D):
x−1t−1xt ,
(
Rx−1t−1,xt
dx−1t−1,xt
0 1
)
, (22)
where Rx−1t−1,xt
= RTxt−1
Rxtand dx−1
t−1,xt= RT
xt−1(dxt
−dxt−1
). Elements in the frame xt are mapped to the tangent
space of xt−1 via the Riemannian Log map (Fig. 5, right):
Logxt−1xt , logG(x
−1t−1xt) =
(
V −1
x−1t−1,xt
dx−1t−1,xt
φx−1t−1,xt
)
(23)
The first entry V −1
x−1t−1,xt
dx−1t−1,xt
are tangent space coordi-
nates of translation and the second entry φx−1t−1,xt
is a rota-
7430
tion vector. The invertible linear operator V −1
x−1t−1,xt
is com-
putable from rotation Rx−1t−1,xt
(or from φx−1t−1,xt
). This is
well-defined for x−1t−1xt sufficiently close to identity and
consistent with small incremental motions.
4.2. Gibbs Sampling Updates
Recall that Eqns. (22) and (23) map xt to the tangent
space of xt−1. When conditioned on rotation, this map-
ping is linear in the translation component dxt. This ob-
servation, combined with Gaussian statistics in the tangent
space, yields closed-form Gibbs updates for translation. To
see this, observe that the distribution over dynamics in the
tangent space is (Fig. 5, right),
NL(xt | xt−1, Q) = N
((
Cdxt+ u
φx−1t−1,xt
)
∣
∣
∣0, Q
)
(24)
where C = V −1
x−1t−1,xt
R⊤xt−1
and u = −Vx−1t−1,xt
R⊤xt−1
dxt−1.
Conditioned on rotation Rxtand previous body frame
xt−1, the corresponding rotation vector φx−1t−1,xt
and matrix
Vx−1t−1,xt
are fixed quantities. This renders C and u com-
putable and yields a Gaussian conditional distribution for
dxt. This conditional constitutes our prior belief about dxt
given Rxt, xt−1 and covariance Q. Similar logic allows us
to derive a Gaussian conditional on dxtgiven future trans-
formation xt+1. These can be analytically combined to pro-
vide a Gaussian distribution for dxt| Rxt
, xt−1, xt+1. Be-
cause this is Gaussian, and the observation model is also
a product of Gaussians whose parameters are known given
{ωk, Ek, θtk}∞k=1 and {ztn}Nt
n=1, it follows that the poste-
rior on dxtis also Gaussian, and analytically computable.
In contrast, sampling of rotation parameters lacks a
closed form. We utilize univariate slice sampling [22] for
the full conditional of each rotation parameter, along with
a fixed number of MCMC proposals to correct for known
rotational symmetries. Details are in the supplement.
Part Association The conditional distribution for a
single assignment to an existing part k ≥ 1 is given by
p(ztn = k | ytn, xt, ω, θt, π, E) ∝ πk p(ytn|xt, ωk, θtk, Ek).Conversely, association to a new part is given by,
p(ztn= −1 |ytn, xt, ω, θt, π, E) ∝ (25)
π∗
∫
p(ytn|xt, ω∗, θt∗, E∗)p(ω∗, θt∗, E∗) d(ω∗, θt∗, E∗)
where π∗ is the stick weight corresponding to the base mea-
sure (i.e. all uninstantiated parts). This is not analytic in our
model, but can be effectively approximated by Monte Carlo
sampling of parts (need only be done once) or approxima-
tion by a constant (since the predictive distribution of parts
will be broad, but centered at the object frame of reference).
We obtain satisfactory results with both approaches.
5. Results
We compare quantitatively and qualitatively to nonpara-
metric and parametric baselines in 5.1. We present results
on dynamic mesh data in 5.2 and demonstrate object seg-
mentation based on relative part motion in 5.3. We show
transfer of learned representations to a novel dataset and
synthesize motion from the learned representation in the
supplement. The video supplemental animates these results.
5.1. Quantitative Comparison
We examine part discovery performance on three ob-
ject motion datasets and compare to manually-annotated
ground-truth. We emphasize that annotations are not in-
corporated into the inference procedure. We refer to the
datasets as hand, spider, and marmoset. hand and
spider are 2-D image data, while marmoset is 3D
data unprojected from a depth camera. Inference utilizes
12 − 44 frames (depending on the dataset) and results are
compared to five manually-annotated ground-truth frames
(where ground truth is the number of parts and their seg-
mentations – examples are in the supplement). In each
dataset, parts have nearly indistinguishable appearances and
none of the compared methods use an appearance model.
Consequently, part discovery is achieved via analysis of mo-
tion dynamics. Inputs only contain foreground (i.e. back-
ground is removed), as is done in related works [20].
We report multi-object tracking and segmentation
(MOTS) metrics [34], which measure how well the part
associations overlap with groundtruth part segmentations
(MOTSA, sMOTSA, MOTSP) and how stable the part as-
sociations are over time (IDS). These metrics are intended
for segmenting multiple targets, but we repurpose them to
segment multiple parts. Comparisons are with IoU 0.3.
We compare against two baselines: the Bayesian non-
parametric model of [39] (discussed in Section 2), which
we call the nonparametric extents model npe, and a para-
metric modification of [39], so that it is given the advan-
tage of knowing the true number of parts. We call this the
parametric extents model pe. Neither npe nor pe consider
part persistence over time (as we do), so for these methods
we use the Hungarian algorithm to compute part correspon-
dences between pairs of timesteps on the distance (in the
body frame) of component means.
Taken together, our model, and the two baselines, consti-
tute an ablation study in which we consider unknown num-
ber of parts with Lie group dynamics, and unknown / known
number of parts, without Lie group dynamics. In all cases,
we compute mean and standard deviation of MOTS statis-
tics on 100 samples taken from a Markov chain of 1000samples, use data-dependent priors (specified in the supple-
mental), and set concentration parameter α = 0.1. Figure 6
(left) shows quantitative results while Figure 9 show quali-
tative comparisons between our method and the baseline.
7431
Dataset Method IDS MOTSA MOTSP sMOTSA
hand
ours 0.00± 0.00 2.79± 0.30 0.71± 0.01 1.34± 0.24
npe 4.45± 1.84 1.93± 0.8 0.51± 0.01 −4.2± 0.78
pe 4.03± 2.11 1.57± 0.44 0.47± 0.01 −0.33± 0.37
spider
ours 5.14± 1.49 3.44± 0.25 0.55± 0.02 1.26± 0.18
npe 19.6± 2.88 −4.4± 0.92 0.51± 0.01 −6.72± 0.9
pe 17.28± 3.06 1.73± 0.31 0.52± 0.01 −0.24± 0.27
marmoset
ours 1.24± 0.65 1.39± 0.89 0.49± 0.02 −0.47± 0.71
npe 3.18± 1.28 −32.44± 2.78 0.35± 0.01 −34.06± 2.72
pe 0.43± 0.51 3.86± 0.17 0.48± 0.00 1.77± 0.17
average
ours 2.12± 0.71 2.54± 0.48 0.58± 0.02 0.71± 0.38
npe 9.07± 2.0 −11.63± 1.5 0.46± 0.01 −15.0± 1.47
pe 7.25± 1.89 2.39± 0.31 0.49± 0.01 0.39± 0.270 10 20 30 40
Time
0
1
2
3
4
5
6
7
Log
Cum
ulat
ive
Ener
gy
Figure 6. (Left): Quantitative comparison of our nonparametric parts model (ours) with nonparametric baseline npe and parametric
baseline pe using MOTS metrics. Lower IDS is better, higher MOTSA, MOTSP, and sMOTSA is better. Best-performing method is
emboldened. (Right): Object segmentation based on part motion across time. Whereas the parts nearest the body center exhibit little
motion (in the body frame), the extremities of spider exhibit large amounts of motion. (Top-Left): Part associations. (Top-Right): Part
segmentation based on motion energy. (Bottom): Log cumulative part motion energy across time (color-coordinated to associations).
Figure 7. Dynamic mesh segmentation. By using points sampled
inside a mesh as the input to our nonparametric parts model, then
computing associations to mesh vertices, our model can learn parts
and dynamics from mesh data. Additional views in Figure 1
Figure 8. Part posteriors for hand and spider. Dotted ellipses are
the mean part covariance, solid ellipses visualize the part posterior
location covariance. Points are observed part locations used for the
posterior updates. The leg locations of spider are smeared due to
their articulation whereas the fingers of the hand are concentrated.
Our model outperforms the nonparametric baseline in all
datasets and metrics. The pe baseline (which benefits from
knowing the number of parts) outperforms our method on
label switches (IDS) and overall quality (sMOTSA) on the
3D marmoset data. This is largely due to noisy data from
the depth sensor generating observations from the back-
ground that are distant from the object, but not so distant
as to be relegated to the base measure. We see very little ID
switching (IDS) and relatively high precision (MOTSP) in
our model, which we attribute to the canonical parts ωk en-
forcing that each part transformation θtk move stably. Visu-
ally, part assignments correspond best to ground-truth parts
that are extremities (fingers, legs, tails), but tend to over-
segment large object interiors (palms, bodies). We attribute
this to the ellipsoidal observation model but find that, for the
purposes of part analysis, it has no obvious negative impact.
5.2. Dynamic Mesh Segmentation
We apply our method to the squat1 sequence in the ar-
ticulated mesh dataset of [33], decomposing the mesh se-
quence into parts as shown in Figure 7. Note that legs are
segmented into two parts each, while arms are segmented
into one part. This is consistent with the movement in
this sequence where the legs bend, but the arms are held
straight. Minor artifacts appear when the lower-left leg (red)
has small numbers of associations above the knee when the
person is squatting, but not when standing straight up. Qual-
itatively, the results conform to human part interpretation.
5.3. Motion Analysis
We show how our model facilitates novel object / part
analysis. Beginning with Figure 8, we visualize part dia-
grams for hand and spider. Dotted ellipses show the ob-
servation noise model Ek for each part (in the object frame),
while solid ellipses show the covariance for that part’s trans-
lation across time. Because the part translation covariances
are spatially separated, the model resists label switching be-
tween parts because they tend to stay proximate to their
canonical frame. We observe that the part translation co-
variances are tight for the hand, but horizontally smeared
for the spider–this is expected, because the fingers moved
7432
Figure 9. Example part associations in hand, spider and marmoset. For each sequence, example frames from the original video
are shown (top-row) with part associations and object/part coordinate frames overlaid from our method (middle-row) and baseline NPE
associations (bottom-row). Parts estimated by our method are largely consistent over time, even for the highly-articulated spider legs.
very little in hand compared to the legs in spider.
One analysis that our model enables is the comparison of
part motions in the body frame (i.e. motion not from the ob-
ject moving, but from its parts). By integrating each part’s
motion over time within the body frame we can determine
which areas of an object experience high or low relative mo-
tion. Figure 6 (right) shows that, for spider, the legs are
able to be segmented from other parts.
6. Discussion
In this work we demonstrated that our nonparametric
representation of kinematic bodies infers meaningful part
decompositions of objects in an unsupervised way, by sim-
ply observing them in motion. Furthermore, our Lie group
representation constrains articulations of moving parts to
physically plausible kinematic states, without the require-
ment of object-specific knowledge such as skeletal struc-
tures. Part decompositions are learnable on very short se-
quences, and generalize to other datasets and instances of
the same object type. In contrast to methods which rely on
extensive training data and/or object-specific 2D/3D mod-
els, we were able to demonstrate robust analysis by direct
observation of single instances of an object.
Our model simplifies inference and motion analysis
while suggesting straightforward extensions. For exam-
ple, part persistence ensures that the representation of parts
persists over a video sequence, even if parts become oc-
cluded. A hierarchical model over multiple videos of simi-
lar objects would thus be robust to occlusions in any single
video. Furthermore, Gaussian tangent-space conditionals
allows closed-form Gibbs updates for translation, efficient
slice sampling of rotation, and proves sufficient for motion
analysis. Explicit models of part shape may avoid over-
segmenting large regions and is the focus of current work.
Acknowledgements This work was partially supported
by ONR N00014-17-1-2072 and NIH 5R01MH111916.
7433
References
[1] Charles E Antoniak. Mixtures of dirichlet processes with ap-
plications to bayesian nonparametric problems. The annals
of statistics, pages 1152–1174, 1974. 2
[2] Matthew J Beal, Zoubin Ghahramani, and Carl E Ras-
mussen. The infinite hidden markov model. In Advances
in neural information processing systems, pages 577–584,
2002. 2
[3] Federica Bogo, Javier Romero, Gerard Pons-Moll, and
Michael J Black. Dynamic faust: Registering human bod-
ies in motion. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 6233–6242,
2017. 2
[4] Guillaume Bourmaud, Remi Megret, Marc Arnaudon, and
Audrey Giremus. Continuous-discrete extended kalman fil-
ter on matrix lie groups using concentrated gaussian dis-
tributions. Journal of Mathematical Imaging and Vision,
51(1):209–228, 2015. 2
[5] C. Bregler and J. Malik. Tracking people with twists and
exponential maps. Proceedings. 1998 IEEE Computer Soci-
ety Conference on Computer Vision and Pattern Recognition
(Cat. No.98CB36231), pages 8–15, 1998. 2
[6] Martin Brossard, Silvere Bonnabel, and Jean-Philippe Con-
domines. Unscented kalman filtering on lie groups. In 2017
IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), pages 2485–2491. IEEE, 2017. 2
[7] David B. Dunson. Bayesian dynamic modeling of latent trait
distributions. Biostatistics, 7(4):551–568, 2006. 2
[8] Ethan Eade. Lie groups for computer vision. Cambridge
Univ., Cam-bridge, UK, Tech. Rep, 2014. 2
[9] Pedro F Felzenszwalb, Ross B Girshick, David McAllester,
and Deva Ramanan. Object detection with discriminatively
trained part-based models. IEEE transactions on pattern
analysis and machine intelligence, 32(9):1627–1645, 2010.
2
[10] Martin A Fischler and Robert A Elschlager. The representa-
tion and matching of pictorial structures. IEEE Transactions
on computers, (1):67–92, 1973. 2
[11] Emily Fox, Erik B Sudderth, Michael I Jordan, and Alan S
Willsky. Bayesian nonparametric inference of switching dy-
namic linear models. IEEE Transactions on Signal Process-
ing, 59(4):1569–1585, 2011. 2
[12] Emily B Fox, Erik B Sudderth, Michael I Jordan, Alan S
Willsky, et al. A sticky hdp-hmm with application to speaker
diarization. The Annals of Applied Statistics, 5(2A):1020–
1056, 2011. 2
[13] Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Ji-
tendra Malik. Recurrent network models for human dynam-
ics. In Proceedings of the IEEE International Conference on
Computer Vision, pages 4346–4354, 2015. 2
[14] Oren Freifeld, Alexander Weiss, Silvia Zuffi, and Michael J
Black. Contour people: A parameterized model of 2d artic-
ulated human shape. In 2010 IEEE Computer Society Con-
ference on Computer Vision and Pattern Recognition, pages
639–646. IEEE, 2010. 2
[15] Jurgen V Gael, Yee W Teh, and Zoubin Ghahramani. The in-
finite factorial hidden markov model. In Advances in Neural
Information Processing Systems, pages 1697–1704, 2009. 2
[16] Brian C Hall. Lie groups, lie algebras, and representations.
In Quantum Theory for Mathematicians, pages 333–366.
Springer, 2013. 2
[17] Søren Hauberg, Francois Lauze, and Kim Steenstrup Peder-
sen. Unscented kalman filtering on riemannian manifolds.
Journal of Mathematical Imaging and Vision, 46(1):103–
120, 2013. 2
[18] Shanon X Ju, Michael J Black, and Yaser Yacoob. Card-
board people: A parameterized model of articulated image
motion. In Proceedings of the Second International Con-
ference on Automatic Face and Gesture Recognition, pages
38–44. IEEE, 1996. 2
[19] Giuseppe Loianno, Michael Watterson, and Vijay Kumar.
Visual inertial odometry for quadrotors on SE(3). Proceed-
ings - IEEE International Conference on Robotics and Au-
tomation, 2016-June(3):1544–1551, 2016. 2
[20] Dominik Lorenz, Leonard Bereska, Timo Milbich, and Bjorn
Ommer. Unsupervised part-based disentangling of object
shape and appearance. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition, pages
10955–10964, 2019. 2, 6
[21] Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Ger-
ard Pons-Moll, and Michael J Black. Amass: Archive of mo-
tion capture as surface shapes. In Proceedings of the IEEE
International Conference on Computer Vision, pages 5442–
5451, 2019. 2
[22] Radford M Neal et al. Slice sampling. The annals of statis-
tics, 31(3):705–767, 2003. 6
[23] Frank C. Park. Distance Metrics on the Rigid-Body Mo-
tions with Applications to Mechanism Design. Journal of
Mechanical Design, 117(1):48–54, 1995. 2
[24] Gerard Pons-Moll, Andreas Baak, Thomas Helten,
Meinard Muller, Hans-Peter Seidel, and Bodo Rosen-
hahn. Multisensor-fusion for 3d full-body human motion
capture. In 2010 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, pages 663–670.
IEEE, 2010. 2
[25] Lu Ren, David B. Dunson, and Lawrence Carin. The dy-
namic hierarchical Dirichlet process. Proceedings of the 25th
international conference on Machine learning - ICML ’08,
pages 824–831, 2008. 2
[26] David A Ross, Daniel Tarlow, and Richard S Zemel. Un-
supervised learning of skeletons from motion. In European
Conference on Computer Vision, pages 560–573. Springer,
2008. 2
[27] J Shotton, A Fitzgibbon, M Cook, T Sharp, M Finocchio,
R Moore, A Kipman, and A Blake. Real-time human pose
recognition in parts from single depth images. In Proceed-
ings of the 2011 IEEE Conference on Computer Vision and
Pattern Recognition, pages 1297–1304. IEEE Computer So-
ciety, 2011. 2
[28] Julian Straub, Jason Chang, Oren Freifeld, and John
Fisher III. A dirichlet process mixture model for spherical
data. In Artificial Intelligence and Statistics, pages 930–938,
2015. 2
7434
[29] Julian Straub, Oren Freifeld, Guy Rosman, John J Leonard,
and John W Fisher III. The manhattan frame model–
manhattan world inference in the space of surface normals.
IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 2017. 3
[30] Erik B Sudderth, Antonio Torralba, William T Freeman, and
Alan S Willsky. Describing visual scenes using transformed
dirichlet processes. In Advances in neural information pro-
cessing systems, pages 1297–1304, 2006. 2
[31] Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and
David M. Blei. Hierarchical Dirichlet processes. Journal of
the American Statistical Association, 101(476):1566–1581,
2006. 2
[32] Isabel Valera, Francisco Ruiz, Lennart Svensson, and Fer-
nando Perez-Cruz. Infinite factorial dynamical model. In
Advances in Neural Information Processing Systems, pages
1666–1674, 2015. 2
[33] Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan
Popovic. Articulated mesh animation from multi-view sil-
houettes. In ACM Transactions on Graphics (TOG), vol-
ume 27, page 97. ACM, 2008. 7
[34] Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon
Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger,
and Bastian Leibe. Mots: Multi-object tracking and segmen-
tation. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 7942–7951, 2019. 6
[35] Yunfeng Wang and Gregory S Chirikjian. Error propaga-
tion on the euclidean group with applications to manipulator
kinematics. IEEE Transactions on Robotics, 22(4):591–602,
2006. 1, 2
[36] Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy,
William T Freeman, Joshua B Tenenbaum, and Jiajun Wu.
Unsupervised discovery of parts, structure, and dynamics.
International Conference on Learning and Representation,
2019. 2
[37] Milos Zefran, Vijay Kumar, and Christopher Croke. Choice
of Riemannian Metrics for Rigid Body Kinematics. ASME
Design Engineering Technical Conference and Computers in
Engineering Conference, (3):1–11, 1996. 2
[38] Jianwen Zhang, Yangqiu Song, Changshui Zhang, and
Shixia Liu. Evolutionary hierarchical dirichlet processes for
multiple correlated time-varying corpora. Proceedings of the
16th ACM SIGKDD international conference on Knowledge
discovery and data mining - KDD ’10, page 1079, 2010. 2
[39] Yang Zhou, Jitendra Sharma, Qiong Ke, Rogier Landman,
Jingli Yuan, Hong Chen, David S Hayden, John W Fisher,
Minqing Jiang, William Menegas, et al. Atypical be-
haviour and connectivity in shank3-mutant macaques. Na-
ture, page 1, 2019. 2, 6
[40] Silvia Zuffi, Angjoo Kanazawa, and Michael J Black. Li-
ons and tigers and bears: Capturing non-rigid, 3d, articulated
shape from images. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 3955–
3963, 2018. 2
[41] Silvia Zuffi, Angjoo Kanazawa, David W Jacobs, and
Michael J Black. 3d menagerie: Modeling the 3d shape and
pose of animals. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 6365–
6373, 2017. 2
7435